Ollama Adapter

Connect to a local Ollama server. Best for development and privacy-first apps: everything stays on device or on your local network.

Configuration

1const llmConfig: LocalLLMConfig = {
2  provider: "ollama",
3  baseUrl: "http://localhost:11434",
4  model: "llama3",        // or "llama3.1", "phi3", "mistral", etc.
5  temperature: 0.7,        // optional: 0.0 to 1.0
6  maxTokens: 1024,         // optional
7  timeoutMs: 60_000,       // optional, default 60s
8};

Setup

Install Ollama

Download and install Ollama for your platform.

Pull a model

ollama pull llama3

Verify it's running

$
curlhttp://localhost:11434/api/tags

Recommended models

Model	Size	JSON quality	Best for
llama3	4.7 GB	★★★★★	Best instruction following, < 10 components
llama3.1:8b	4.9 GB	★★★★☆	Good speed/quality balance
mistral	4.1 GB	★★★★☆	Reliable JSON output
phi3:mini	2.3 GB	★★★☆☆	Fastest, minimal context only

Component count matters

Local models work best with fewer than 10 registered components. Beyond that, JSON output quality degrades. Use maxContextMessages: 10 and maxContextChars: 6000 for local models.

How it works under the hood

The adapter calls POST /api/chat with:

Request body

1{
2  "model": "llama3",
3  "messages": [...],
4  "stream": false,
5  "format": "json",
6  "options": { "temperature": 0.7, "num_predict": 1024 }
7}

The format: "json" flag forces Ollama to output valid JSON. The adapter pings /api/tags for connectivity checks.

← ComponentRenderer LM Studio →