How to connect Llama 3 8B to a React Native app via Ollama and WireAI, handle iOS/Android networking quirks, and render generative UI components from local model outputs, no API key required.
Running Llama 3 in a React Native app is simpler than it sounds. Install Ollama on your dev machine, point WireAI's OllamaAdapter at the correct localhost URL, and the 8B model reliably selects and populates native components from your registry, completely offline, zero API cost.
The promise of on-device AI has been around for years, but practical mobile implementations were painful. Now, with quantized Llama 3 running in Ollama and a purpose-built React Native SDK, you can ship a fully local AI app in an afternoon.
Why use Llama 3 over a cloud API?
Three reasons: cost, privacy, and offline capability. Cloud LLMs bill per token. A power user sending 50 messages per day adds up fast. Llama 3 via Ollama has zero marginal cost. For health, finance, or journaling apps, keeping data on-device is also an instant GDPR compliance story with no policy changes required.
Which Llama 3 variant works best with WireAI?
For the WireAI component selection task you don't need the largest model. The job is simple: read a list of components, pick one, output valid JSON. Llama 3 8B handles this consistently in its 4-bit quantized form.
- llama3:8b-instruct-q4_K_M, Recommended. Fast (~400ms on M1 Mac), reliable JSON output, ~5GB RAM.
- llama3:70b-instruct-q4_K_M, Complex multi-step agents. Needs 40GB+ RAM.
- phi3:mini, Fastest (~200ms). Use when latency matters more than reasoning depth.
How do you set up Ollama for React Native?
The most common mistake is forgetting to expose Ollama beyond localhost. By default, Ollama only listens on 127.0.0.1, which Android physical devices can't reach. Fix this before starting:
# Expose Ollama to your LAN for physical device testing OLLAMA_HOST=0.0.0.0 ollama serve # Pull the recommended model ollama pull llama3:8b-instruct-q4_K_M
How do you connect WireAI to Ollama in Expo?
The WireAI OllamaAdapter needs the correct base URL depending on your test environment. iOS Simulator shares the host machine's network stack. Android Emulator routes through a virtual NAT at 10.0.2.2. Physical devices need your LAN IP.
import { WireAIProvider, OllamaAdapter } from 'wireai-rn';
import { Platform } from 'react-native';
import { builtInComponents } from 'wireai-rn/components';
const getBaseUrl = () => {
if (Platform.OS === 'android') return 'http://10.0.2.2:11434';
return 'http://localhost:11434';
};
export default function App() {
return (
<WireAIProvider
adapter={new OllamaAdapter({
baseUrl: getBaseUrl(),
model: 'llama3:8b-instruct-q4_K_M',
})}
components={builtInComponents}
>
<ChatScreen />
</WireAIProvider>
);
}Why local models need the flat component model
Web generative UI frameworks often ask the LLM to generate deeply nested UI trees. This works with GPT-4o. Smaller models like Llama 3 8B fail on complex nested schemas. WireAI's flat component model sidesteps this entirely, the model picks one component and fills its props. Hallucination rate drops from ~30% to under 5% for a registry of 11 components.
What is the expected latency on a physical device?
On a physical iPhone connected to a Mac mini via WiFi, Llama 3 8B typically returns the first component JSON in 350–600ms. The bottleneck is network round-trip, not inference time. Show a loading skeleton, then swap in the rendered component when the JSON arrives.
Start building local AI apps for free. Run npm install wireai-rn zod and ollama pull llama3.