The Case for Local LLMs: Running Models on Mobile

Privacy and cost are the biggest hurdles for AI apps. Learn how to connect React Native apps to local models via Ollama and WireAI.

Running LLMs locally via Ollama or LM Studio eliminates cloud API costs and keeps user data on-device. The WireAI OllamaAdapter connects your React Native app to a locally-running Llama 3 instance in two lines of code. The result: generative UI that works offline, costs nothing per request, and never sends sensitive data to a third party.

Not every prompt needs GPT-4. Component selection, which UI element should I show next?, is a structured output task that 7B parameter models handle well. If your app does health coaching, journaling, or financial tracking, the data is too sensitive for cloud APIs anyway. Local models fix both the cost and the compliance problem simultaneously.

Three reasons to choose local LLMs

Zero marginal cost: Cloud APIs bill per token. A power user sending 60 messages a day costs roughly $0.15/day on GPT-4o Mini, $54/year per active user before infrastructure. A local model costs $0 per request.
Offline capability: Mobile apps lose connectivity constantly, underground, in airplanes, in poor signal areas. A local LLM keeps your app fully functional regardless of network state. A cloud-only AI app becomes a broken app the moment the user goes offline.
Privacy and compliance: GDPR, HIPAA, and CCPA all become significantly simpler when data never leaves the device. Health apps, finance apps, and journal apps can avoid complex data processing agreements by keeping inference entirely local.

Can local models handle WireAI's component selection task?

The task is simpler than it looks. The model reads a list of registered components with descriptions and outputs a JSON object naming one component and its props, it does not need complex multi-step reasoning. On a registry of 11 components, Llama 3 8B achieves approximately 89% Zod validation pass rate on first attempt. WireAI's fallback layer handles the rest, the user sees a text response instead of a component, and the conversation continues.

For apps where reliability is critical, the cloud adapter (GPT-4o: 97%, Claude 3.5: 96%) is the right choice. For apps where privacy or cost dominates, Llama 3 8B's 89% rate with WireAI's built-in fallback is good enough to ship.

Dev machine vs. self-hosted vs. on-device

"Local LLM" can mean three different things, and your architecture depends on your target audience:

Developer-facing apps: Your users run Ollama on their own machines. The app connects to http://localhost:11434. Zero infrastructure to manage.
Consumer apps, self-hosted: End users can't run Ollama. You host an Ollama instance on your own server. Users connect to your private inference endpoint. Data never reaches OpenAI or Anthropic. The baseUrl is your server IP.
True on-device inference: The model runs on the phone's Neural Engine. Practical only for small models (Phi-3 Mini, Gemma 2B) on recent high-end iPhones with 8GB+ RAM. Requires native modules and adds ~200MB to the app bundle. Not recommended for most apps yet.

Connecting WireAI to Ollama

import { WireAIProvider, OllamaAdapter } from 'wireai-rn';
import { builtInComponents } from 'wireai-rn/components';
import { Platform } from 'react-native';

export default function App() {
  return (
    <WireAIProvider
      adapter={new OllamaAdapter({
        // Android emulator uses 10.0.2.2, iOS simulator uses localhost
        baseUrl: Platform.OS === 'android' ? 'http://10.0.2.2:11434' : 'http://localhost:11434',
        model: 'llama3:8b-instruct-q4_K_M',
      })}
      components={builtInComponents}
    >
      <RootNavigator />
    </WireAIProvider>
  );
}

For the full step-by-step setup including starting Ollama, pulling models, and fixing physical device networking, see the Ollama setup guide.

The hybrid production architecture

The most practical production pattern combines both worlds: local models for free-tier basic interactions (zero cost, offline-capable), cloud models for premium complex interactions (higher reliability, richer reasoning). WireAI's adapter pattern makes this a one-line config change based on the user's subscription tier.

function getAdapter(user: User) {
  // Pro users get cloud model accuracy; free users get local model privacy + zero cost
  return user.isPro
    ? new CloudAdapter({ provider: 'openai', model: 'gpt-4o-mini' })
    : new OllamaAdapter({ baseUrl: getOllamaBaseUrl(), model: 'llama3' });
}

See the full monetization guide for unit economics, pricing tier structures, and the RevenueCat integration pattern.

Build offline-first AI apps. Run npm install wireai-rn zod and ollama pull llama3.