Local LLM UI Rendering in React Native (Ollama + LMStudio)
How to render native UI from a local LLM in React Native: run Ollama or LMStudio on-device or on your LAN, return one validated component per turn, and ship a generative flow that works offline.
Local LLM UI rendering in React Native means a model running on-device or on your own LAN (through Ollama or LMStudio) returns JSON for one native component per turn, and the app renders it without a cloud round trip. Wire RN does this with two local adapters: you point the SDK at localhost:11434 (Ollama) or localhost:1234 (LMStudio), register your components with a Zod schema, and the validated component mounts natively. No API key, no network dependency, no data leaving the device or your network.
Most "local LLM on mobile" posts stop at running inference and printing a string. The interesting part starts after that: turning that string into a screen the user can tap. That is the gap this post fills.
Why render UI from a local model at all?
Because some flows should never touch a cloud API. A mental-health check-in, a journaling prompt, a medical intake form: those carry data a founder may not want sitting in OpenAI logs. A local model keeps the inference and the rendered UI on the device or inside your own network. You also get zero per-token cost and no rate limits, which matters when an onboarding flow fires a model call on every screen.
The tradeoff is honest. Local models are slower and dumber than GPT-4o or Claude. On a Pixel 8 running a 3B model through Ollama on the same LAN, I see first-token latency of roughly 400 to 900 ms, not the 150 ms a hosted frontier model gives. For one-component-per-turn generative UI, that is fine. For long free-text reasoning, it is not. Pick the workload to fit. The broader case for on-device models is in the case for local LLMs on mobile.
What you need before starting
- Node 20 or newer and Expo SDK ~55.
- Ollama (
ollama serve) or LMStudio running its local server, on your dev machine or a box on the same network. - A small instruct model pulled:
ollama pull llama3.2:3bworks well for component selection. wireai-rnandzodinstalled.
npm install wireai-rn zod ollama pull llama3.2:3b ollama serve
One trap up front: the simulator and a physical device do not see localhost the same way. The Android emulator reaches your host at 10.0.2.2. A real device needs your machine's LAN IP. Get this wrong and the request hangs with no error, which is exactly how I lost an afternoon.
Step 1: Point Wire RN at the local model
The Ollama adapter takes a base URL and a model name. Everything else (the system prompt, the component schemas, the validation) is the same code path a cloud adapter uses. That is the design choice that makes switching providers a one-line change.
import { WireAIProvider, OllamaAdapter } from 'wireai-rn';
import { builtInComponents } from 'wireai-rn/components';
const ollama = new OllamaAdapter({
// physical device: use your machine's LAN IP, not localhost
baseUrl: 'http://192.168.1.42:11434',
model: 'llama3.2:3b',
});
export default function App() {
return (
<WireAIProvider adapter={ollama} components={builtInComponents}>
<OnboardingScreen />
</WireAIProvider>
);
}The full adapter options live in the Ollama adapter docs. If you are on LMStudio instead, the shape is identical: swap OllamaAdapter for the LMStudio adapter and change the port to 1234.
Step 2: Register a component the model can pick
Wire RN renders components from a registry. Each one has a name, a Zod schema for its props, and a description the SDK feeds into the system prompt so the model knows when to use it. The schema is the contract: if the local model returns props that do not match, the component is skipped instead of crashing the screen.
import { registerComponent } from 'wireai-rn';
import { z } from 'zod';
import { MoodCheckInUI } from './components/MoodCheckInUI';
export const MoodCheckIn = registerComponent({
name: "MoodCheckIn",
description: "Ask the user how they feel on a 1 to 5 scale with an optional note.",
schema: z.object({
prompt: z.string(),
scaleLabels: z.array(z.string()).length(5),
allowNote: z.boolean().default(true),
}),
render: ({ props, onSubmit }) => (
<MoodCheckInUI
prompt={props.prompt}
scaleLabels={props.scaleLabels}
allowNote={props.allowNote}
onSubmit={onSubmit}
/>
),
});Keep the schema tight. A small local model picks a component far more reliably when the schema is flat and the prop count is low. Five props is comfortable. Fifteen nested props is where a 3B model starts hallucinating keys. Writing your own components is covered in the custom-components guide.
Step 3: Render the turn and validate the output
The render loop is the same one a cloud model drives. The local model returns a JSON object naming a component and its props, Wire RN validates it with Zod, and the matching native component mounts. One component per turn. The user answers, the answer goes back to the model, the next component arrives.
import { useWireAIThread, WireAIMessageRenderer } from 'wireai-rn';
import { View } from 'react-native';
function OnboardingScreen() {
const { messages, sendMessage } = useWireAIThread();
return (
<View>
{messages.map((m) => (
<WireAIMessageRenderer key={m.id} message={m} onAction={sendMessage} />
))}
</View>
);
}Validation matters more with a local model than a cloud one. A frontier model rarely fumbles a flat schema. A quantized 3B model on a phone will, maybe one turn in twenty, return a stray key or a number where a string belongs. Without a schema gate that is a crash in production. With one it is a silent fallback the user never sees.
How does local rendering compare to a cloud adapter?
The code is nearly identical; the tradeoffs are not. The table below is the shape I use when deciding which adapter a given flow should run on. Numbers are from my own LAN dogfood testing in May 2026 on a Pixel 8 and an iPhone 13 talking to Ollama on a MacBook, not a controlled benchmark, so treat them as ballpark.
- First-token latency: local 3B over LAN ~400 to 900 ms; hosted frontier model ~150 ms.
- Cost per turn: local = zero; cloud = per-token, which adds up when every onboarding screen fires a call.
- Data exposure: local = stays on your hardware; cloud = the prompt and answers transit a third party.
- Component-pick accuracy: local 3B is reliable on flat schemas, shaky on deep ones; a frontier model rarely misses either.
- Works offline: local = yes (on-device or LAN); cloud = no.
The practical takeaway: use a local model for the privacy-sensitive, high-frequency, simple-pick turns, and reach for a cloud model when a turn needs real reasoning. Because the adapter is the only thing that changes, you can mix them in one app. Set a cloud adapter as the default and swap in OllamaAdapter for the screens that must stay private.
What I learned shipping this on a real device
After 9 years on React Native, the bug that still costs me the most time is the one that fails silently, and local LLM rendering has a few. The biggest: a request to localhost from a physical device does not error, it just never resolves. I sat watching a spinner for twenty minutes before I remembered the device cannot see my Mac's loopback. Switching to the LAN IP fixed it instantly.
The second lesson was about model size. I started with a 7B model because bigger felt safer. On a mid-range Android over LAN it was slow enough that the onboarding flow felt broken. Dropping to a 3B instruct model cut latency by more than half and the component picks did not get noticeably worse, because picking from a registry is a constrained task, not open reasoning. Small wins here.
Common pitfalls
- localhost on a real device. Use the LAN IP for a physical phone,
10.0.2.2for the Android emulator.localhostonly works in the iOS simulator. - No streaming on Hermes by default. If you stream local tokens, you hit the same missing
ReadableStreamwall every cloud SDK hits. Wire RN's XHR + SSE layer handles it; the detail is in the Hermes ReadableStream fix. - Model too big. A 7B model on a mid-range phone over LAN is borderline. Start at 3B for component selection and only scale up if the picks get sloppy.
- Cold start. The first request after
ollama serveloads the model into memory and can take several seconds. Warm it on app launch with a throwaway call. - Mixed content blocks on iOS. A plain
http://LAN endpoint can trip App Transport Security in a release build. Allow the local domain explicitly in dev, and front the model with HTTPS for anything shipped.
FAQ
Can a local LLM really drive a React Native UI offline?
Yes, if the model runs on the device or on a machine the device can reach. With Ollama or LMStudio the model returns JSON for one component, Wire RN validates and renders it, and no cloud call happens. True on-device inference (model inside the app bundle) is heavier; LAN-hosted Ollama is the practical middle ground today.
Which local model should I use for component selection?
A 3B instruct model like Llama 3.2 3B is the sweet spot. Component selection is a constrained classification task, not open-ended reasoning, so a small model handles it well when the schemas are flat. Running Llama specifically is covered in running Llama 3 in React Native.
Is local rendering slower than a cloud API?
First-token latency is higher (roughly 400 to 900 ms in my LAN tests versus around 150 ms for a hosted frontier model). For one-component-per-turn generative UI that is acceptable. For long free-text generation it is not, so match the workload to the model.
Keep the inference and the UI on your own hardware. Run npm install wireai-rn zod and start from the Wire RN quick-start. The source is on GitHub.