How to Stream LLM Responses in React Native: Token-by-Token UX

Streaming LLM output in React Native makes AI responses feel instant. This guide covers three streaming architectures, SSE, WebSockets, and fetch ReadableStream, with code for each and performance benchmarks.

Streaming LLM responses in React Native means displaying tokens as they arrive rather than waiting for the full response. The three practical approaches are Server-Sent Events (SSE), WebSocket binary streams, and fetch ReadableStream. SSE is the easiest to implement and works with most AI provider APIs. ReadableStream is natively supported in React Native 0.74+ and handles streaming fetch responses without a native module. WebSockets offer the lowest latency for bidirectional real-time audio, but add complexity. For most React Native AI apps, SSE via fetch is the right starting point.

A 2-second wait while a spinner spins feels like a broken app. The same 2 seconds while text streams in character by character feels like a fast, responsive agent. The perceived performance improvement from streaming is dramatic, users consistently rate streamed responses as faster even when total latency is identical. If you are building an AI mobile app and not streaming, you are leaving the most impactful UX improvement on the table.

Why streaming is harder in React Native than on the web

Web browsers have native streaming support: fetch returns a ReadableStream, the EventSource API handles SSE, and the network layer operates in the browser's privileged process. React Native's JavaScript runtime (Hermes) historically had weaker streaming support because all network calls cross the native bridge.

This has improved significantly. React Native 0.74 shipped with better ReadableStream support in the Hermes engine. Expo SDK 51 and later use RN 0.74+, so most active projects have access to it. The caveats: EventSource is still not available natively in React Native (you need a polyfill or a different SSE approach), and ReadableStream behavior can be inconsistent across Hermes versions. This guide covers what actually works in production in 2026.

Architecture: where streaming happens

Before choosing a streaming method, understand the data path. The React Native app never calls the LLM API directly (see the security section of the Claude API guide). The streaming path is:

LLM API (Anthropic/OpenAI) → streams tokens to your backend server.
Backend server → forwards the stream to the mobile client using SSE, WebSocket, or a streaming HTTP response.
React Native app → reads the stream and updates state as each chunk arrives, re-rendering the message in real time.

The streaming protocol you choose is between your backend and the app, not between the backend and the LLM API (that is always SSE for OpenAI-compatible APIs and Anthropic's streaming events).

Option 1: SSE via fetch (recommended starting point)

Server-Sent Events is a one-way HTTP protocol where the server pushes a stream of text events after the client makes a single HTTP request. It is the format that OpenAI, Anthropic, and most LLM APIs use for their own streaming endpoints. Forwarding this stream to the mobile app is the simplest possible architecture.

React Native does not have a native EventSource implementation. The workaround that works reliably across Expo and bare React Native is to use a streaming fetch response and process the SSE lines manually:

// hooks/useStreamingResponse.ts
import { useState, useCallback } from "react";

export function useStreamingResponse() {
  const [streamedText, setStreamedText] = useState("");
  const [isStreaming, setIsStreaming] = useState(false);

  const stream = useCallback(async (userMessage: string) => {
    setStreamedText("");
    setIsStreaming(true);

    try {
      const response = await fetch("https://your-api.com/api/chat/stream", {
        method: "POST",
        headers: { "Content-Type": "application/json" },
        body: JSON.stringify({ message: userMessage }),
      });

      if (!response.ok || !response.body) {
        throw new Error(`HTTP ${response.status}`);
      }

      const reader = response.body.getReader();
      const decoder = new TextDecoder();
      let buffer = "";

      while (true) {
        const { done, value } = await reader.read();
        if (done) break;

        buffer += decoder.decode(value, { stream: true });

        // SSE lines look like: "data: some token text\n\n"
        const lines = buffer.split("\n");
        buffer = lines.pop() ?? ""; // keep incomplete line in buffer

        for (const line of lines) {
          if (line.startsWith("data: ")) {
            const data = line.slice(6).trim();
            if (data === "[DONE]") break;

            try {
              const parsed = JSON.parse(data);
              // Adjust for your backend's response shape
              const token = parsed.delta ?? parsed.text ?? "";
              if (token) {
                setStreamedText((prev) => prev + token);
              }
            } catch {
              // Not JSON, treat as raw text token
              setStreamedText((prev) => prev + data);
            }
          }
        }
      }
    } catch (err) {
      console.error("Streaming error:", err);
    } finally {
      setIsStreaming(false);
    }
  }, []);

  return { streamedText, isStreaming, stream };
}

On the backend, forward the LLM stream as SSE. Here is a Next.js route that proxies Claude's streaming API:

// app/api/chat/stream/route.ts
import Anthropic from "@anthropic-ai/sdk";
import { NextRequest } from "next/server";

const client = new Anthropic({ apiKey: process.env.ANTHROPIC_API_KEY });

export async function POST(req: NextRequest) {
  const { message, systemPrompt } = await req.json();

  const encoder = new TextEncoder();

  const readable = new ReadableStream({
    async start(controller) {
      const send = (data: string) =>
        controller.enqueue(encoder.encode(`data: ${data}\n\n`));

      const stream = client.messages.stream({
        model: "claude-3-5-sonnet-20241022",
        max_tokens: 1024,
        system: systemPrompt,
        messages: [{ role: "user", content: message }],
      });

      for await (const event of stream) {
        if (
          event.type === "content_block_delta" &&
          event.delta.type === "text_delta"
        ) {
          send(JSON.stringify({ delta: event.delta.text }));
        }
      }

      send("[DONE]");
      controller.close();
    },
  });

  return new Response(readable, {
    headers: {
      "Content-Type": "text/event-stream",
      "Cache-Control": "no-cache",
      Connection: "keep-alive",
    },
  });
}

Option 2: WebSocket streaming

WebSockets are bidirectional and lower-latency than SSE, but they require a persistent connection and more complex server management. They are the right choice for real-time voice applications (see the voice AI guide) where the app simultaneously streams audio to the server and receives tokens back. For text-only chat, SSE is simpler and adequate.

React Native has built-in WebSocket support. Here is the client-side pattern:

// hooks/useWebSocketStream.ts
import { useState, useRef, useCallback } from "react";

export function useWebSocketStream(wsUrl: string) {
  const [streamedText, setStreamedText] = useState("");
  const [isConnected, setIsConnected] = useState(false);
  const wsRef = useRef<WebSocket | null>(null);

  const connect = useCallback(() => {
    const ws = new WebSocket(wsUrl);
    wsRef.current = ws;

    ws.onopen = () => setIsConnected(true);
    ws.onclose = () => setIsConnected(false);

    ws.onmessage = (event) => {
      const { type, token } = JSON.parse(event.data as string);
      if (type === "token") {
        setStreamedText((prev) => prev + token);
      } else if (type === "done") {
        // Stream complete
      }
    };
  }, [wsUrl]);

  const sendMessage = useCallback((text: string) => {
    setStreamedText("");
    wsRef.current?.send(JSON.stringify({ message: text }));
  }, []);

  return { streamedText, isConnected, connect, sendMessage };
}

Rendering streamed text smoothly in React Native

Calling setState for every token means React re-renders the component on every token. At 20–40 tokens per second, this can cause jank on lower-end Android devices. Two optimizations:

1. Batch token updates. Buffer tokens for 50ms before flushing to state, rather than one setState per token. This reduces renders from 30/second to 20/second with no perceptible effect on streaming feel:

// Batch token updates to reduce re-renders
const tokenBufferRef = useRef("");
const flushTimerRef = useRef<ReturnType<typeof setTimeout> | null>(null);

function handleToken(token: string) {
  tokenBufferRef.current += token;

  if (!flushTimerRef.current) {
    flushTimerRef.current = setTimeout(() => {
      setStreamedText((prev) => prev + tokenBufferRef.current);
      tokenBufferRef.current = "";
      flushTimerRef.current = null;
    }, 50); // flush every 50ms
  }
}

2. Use a memoized message bubble component. Wrap your message list items in React.memo so that only the currently-streaming message re-renders, not the entire message history.

const MessageBubble = React.memo(({ content, isStreaming }: {
  content: string;
  isStreaming: boolean;
}) => (
  <View style={styles.bubble}>
    <Text style={styles.text}>{content}</Text>
    {isStreaming && <ActivityIndicator size="small" />}
  </View>
));

Streaming structured output: the WireAI pattern

Text streaming is straightforward. Structured JSON streaming, where you want to render a native component progressively as its props arrive, is harder. The challenge is that JSON is not valid until the closing brace arrives. Trying to parse partial JSON will always throw.

The WireAI approach is called "component-level streaming." Instead of streaming individual JSON characters, the backend accumulates the full JSON response (which takes 200–400ms for a component with 3–5 props), then dispatches it as a single message. For the streaming preview, WireAI shows a component skeleton (a pulsing placeholder in the component's shape) while the JSON accumulates, then snaps to the real component when the full response arrives. The result feels faster than a spinner even though no partial JSON is being rendered.

This is why the WireAI Pro tier describes "component skeleton streaming in under 200ms", the skeleton renders immediately on message dispatch, and the real component replaces it when the valid JSON arrives.

Streaming performance comparison

SSE via fetch, Time to first token: 80–120ms above base LLM latency. Works out of the box in RN 0.74+. Best for: text chat streaming.
WebSocket, Time to first token: 30–60ms above base LLM latency. Requires persistent connection management. Best for: voice + text bidirectional streaming.
Polling (not recommended), Average 500ms token delay depending on poll interval. Simple to implement. Only appropriate for non-real-time summaries.

For a Claude 3 Haiku response averaging 150 tokens, streaming saves approximately 1.2–1.8 seconds of perceived wait time versus waiting for the full response. For Claude 3.5 Sonnet responses averaging 250 tokens, the saving is 2–3 seconds. Users notice.

Common issues and fixes

Stream hangs on Android, Older Hermes versions buffer the response before exposing the ReadableStream. Update to Expo SDK 51+ (RN 0.74) to fix this.
Tokens arrive in chunks of 20-30 rather than one-by-one, Normal behavior when network conditions batch packets. Not a bug, your rendering logic should handle multi-token chunks.
Stream does not end, Your backend is not sending a final [DONE] SSE event, or your while(true) loop is not checking done === true from the reader. Both are needed.
Memory leak after navigation, Cancel the stream when the component unmounts. Call reader.cancel() in the useEffect cleanup function.

Streaming is table stakes for AI mobile apps in 2026. Start with the SSE pattern above, connect it to your WireAI component registry, and your users will feel the difference on the first response. Install the runtime with npm install wireai-rn zod.