Skip to content

Ollama Adapter

Connect to a local Ollama server. Best for development and privacy-first apps: everything stays on device or on your local network.

Configuration

Ts
1const llmConfig: LocalLLMConfig = {
2  provider: "ollama",
3  baseUrl: "http://localhost:11434",
4  model: "llama3",        // or "llama3.1", "phi3", "mistral", etc.
5  temperature: 0.7,        // optional: 0.0 to 1.0
6  maxTokens: 1024,         // optional
7  timeoutMs: 60_000,       // optional, default 60s
8};

Setup

1

Install Ollama

Download and install Ollama for your platform.
2

Pull a model

ollama pull llama3
3

Verify it's running

$
curlhttp://localhost:11434/api/tags

Recommended models

ModelSizeJSON qualityBest for
llama34.7 GB★★★★★Best instruction following, < 10 components
llama3.1:8b4.9 GB★★★★☆Good speed/quality balance
mistral4.1 GB★★★★☆Reliable JSON output
phi3:mini2.3 GB★★★☆☆Fastest, minimal context only
Component count matters
Local models work best with fewer than 10 registered components. Beyond that, JSON output quality degrades. Use maxContextMessages: 10 and maxContextChars: 6000 for local models.

How it works under the hood

The adapter calls POST /api/chat with:

Request body
1{
2  "model": "llama3",
3  "messages": [...],
4  "stream": false,
5  "format": "json",
6  "options": { "temperature": 0.7, "num_predict": 1024 }
7}

The format: "json" flag forces Ollama to output valid JSON. The adapter pings /api/tags for connectivity checks.