Ollama Adapter
Connect to a local Ollama server. Best for development and privacy-first apps: everything stays on device or on your local network.
Configuration
Ts
1const llmConfig: LocalLLMConfig = {
2 provider: "ollama",
3 baseUrl: "http://localhost:11434",
4 model: "llama3", // or "llama3.1", "phi3", "mistral", etc.
5 temperature: 0.7, // optional: 0.0 to 1.0
6 maxTokens: 1024, // optional
7 timeoutMs: 60_000, // optional, default 60s
8};Setup
1
Install Ollama
Download and install Ollama for your platform.
2
Pull a model
ollama pull llama33
Verify it's running
$curlhttp://localhost:11434/api/tagsRecommended models
| Model | Size | JSON quality | Best for |
|---|---|---|---|
| llama3 | 4.7 GB | ★★★★★ | Best instruction following, < 10 components |
| llama3.1:8b | 4.9 GB | ★★★★☆ | Good speed/quality balance |
| mistral | 4.1 GB | ★★★★☆ | Reliable JSON output |
| phi3:mini | 2.3 GB | ★★★☆☆ | Fastest, minimal context only |
Component count matters
Local models work best with fewer than 10 registered components. Beyond that, JSON output quality degrades. Use
maxContextMessages: 10 and maxContextChars: 6000 for local models.How it works under the hood
The adapter calls POST /api/chat with:
Request body
1{
2 "model": "llama3",
3 "messages": [...],
4 "stream": false,
5 "format": "json",
6 "options": { "temperature": 0.7, "num_predict": 1024 }
7}The format: "json" flag forces Ollama to output valid JSON. The adapter pings /api/tags for connectivity checks.