Glossary

Streaming UI

Definition

Streaming UI is the pattern where components render and update as the LLM streams tokens, instead of waiting for the full response. On React Native the catch is the runtime: Hermes does not implement the fetch ReadableStream body, so reading tokens off an HTTP response requires an XHR polyfill or a native fetch implementation. The visible win is a UI that paints the component skeleton early and fills in props as they arrive, which removes the dead air that kills perceived latency on local LLMs.

Example

On the web, fetch().body.getReader() just works. On React Native, the body stream is undefined under Hermes, so you install a polyfill that wires fetch on top of XHR's incremental responses. Once the polyfill is loaded, the consumer code looks the same as the web.

// React Native + Hermes: fetch().body is undefined, so a polyfill is required.
// react-native-fetch-api + react-native-polyfill-globals install one.
import "react-native-polyfill-globals/auto";

async function streamCompletion(url: string, body: object) {
  const res = await fetch(url, {
    method: "POST",
    headers: { "Content-Type": "application/json", Accept: "text/event-stream" },
    body: JSON.stringify(body),
    reactNative: { textStreaming: true }, // RN-specific opt-in
  });

  const reader = res.body!.getReader();
  const decoder = new TextDecoder();
  while (true) {
    const { done, value } = await reader.read();
    if (done) break;
    yield decoder.decode(value); // hand each chunk to the renderer
  }
}

The renderer takes each chunk, updates the in-flight component, and re-renders. Skeleton visible in 200ms, props filling in as the LLM produces them.

When to use it

Local LLM apps where total inference time is over a second and you need the user to see progress before the full response lands.
Long-form generation (summaries, drafts, multi-paragraph answers) where the user can read the first sentence while later sentences are still arriving.
Multi-component agent turns where each component should appear as the LLM decides on it, not after the whole reasoning chain completes.
Onboarding and chat flows where dead air over half a second is enough to make the experience feel broken.

When NOT to use it

Short responses under 200ms. The polyfill and reducer overhead is not worth it when the full response is already faster than the streaming setup.
Strict JSON outputs that must be validated before render. Partial JSON breaks parsers; wait for the full payload, then mount.
Components that mutate destructively on each update (checkboxes, toggles, animated counters) where intermediate states would look wrong to the user.
Apps targeting React Native on the New Architecture without confirming the XHR polyfill still works on that runtime. Test your streaming setup on Bridgeless + New Arch before shipping.
Hard offline modes where the LLM runs synchronously inside a native module and streaming would only add an inter-thread hop without changing perceived latency.

Related terms

Generative UI for Mobile (streaming UI is how generative components feel fast on mobile).
A2UI Protocol (A2UI envelopes can stream chunk-by-chunk over the same wire).
A2A protocol reference (for streaming over agent-to-agent transports).

← Back to glossary