Streaming LLM Tokens to the Browser: The Production SSE Setup

A spinner is a lie. It tells the user something is happening without telling them what. When spectr-ai generates a security report, the LLM produces text token by token over 15 to 40 seconds. If I wait for the full response and then drop it on the page, the user stares at nothing the whole time. If I stream each token as it arrives, the report writes itself in front of them, exactly like ChatGPT. Same wait, completely different feel.

A while back I covered SSE for progress bars: the server sends a handful of step and progress events, the client moves a bar. This is the token-streaming version. Instead of a few discrete progress events, the server forwards hundreds of text fragments coming out of the model in real time. The transport is the same (Server-Sent Events over a fetch stream), but the source, the parsing, and the failure modes are different.

Here is the full production setup: a Next.js 15 Route Handler that consumes the model's own stream and re-emits it, a client reader that renders tokens as they land, and the cancellation and error handling you actually need when the thing runs for 40 seconds.

Why Not Just EventSource

The browser's EventSource is the obvious tool for SSE, and it handles reconnection for free. But it only does GET requests. spectr-ai sends a POST with the contract source and the chosen model in the body, so EventSource is out. We read the response stream by hand with fetch and response.body.getReader(). That is also what gives us an AbortController to cancel, which EventSource does not expose cleanly.

Streaming LLM Tokens to the Browser: The Production SSE Setup

Related reading

Streaming LLM responses to the browser in Go (Server-Sent Events)

Streaming Long AI Jobs to the Browser: SSE Patterns From Building an Audit Tool

Streaming LLM responses in TypeScript: SSE, ReadableStream, and the React 19…

Streaming an LLM response, in 4 GIFs

Streaming LLM Responses: Make Your AI App Feel Fast

Streaming Claude to the Browser With Backpressure That Actually Works