Live chain-of-thought in a chatbot: how to actually stream the tool calls (not just the text)

Most "streaming" LLM chatbots stream just the text. The model says "I'll search for that…" and then you wait 6 seconds while the tokens dribble in. The actual search? Hidden. The 3 scrapes it did to fact-check? Hidden. You're staring at a typing indicator that doesn't tell you anything about what's actually taking time.

I just built a chatbot where every tool call surfaces as a step in real time — 🔍 search_engine, 📄 scrape_as_markdown, 📄 scrape_as_markdown — while the response streams token by token afterwards. The user sees the agent's chain-of-thought as it happens, not as a postmortem.

The trick is that you have to stream three different things, and each layer needs to know what to do with each kind of event. Here's the architecture.

The shape of the stream

The agent runner (in my case, fi-runner wrapping the Claude Agent SDK) emits events of three types as they happen:

The trick is that you have to stream three different things, and each layer needs to know what to do with each kind of event. Here's the architecture.

The shape of the stream

The agent runner (in my case, fi-runner wrapping the Claude Agent SDK) emits events of three types as they happen:

Live chain-of-thought in a chatbot: how to actually stream the tool calls (not just the text)

Live chain-of-thought in a chatbot: how to actually stream the tool calls (not just the text)

Related reading

Streaming LLM Responses: Make Your AI App Feel Fast

Struggling with Slow AI Responses: Building a Streaming Chat UI with SSE

I measured every millisecond of my real-time AI pipeline. The LLM was the fast…

I Made My Voice Agent Feel Faster by Streaming Sentences, Not Audio

Streaming an LLM response, in 4 GIFs

Streaming LLM responses to the browser in Go (Server-Sent Events)

Related reading

Streaming LLM Responses: Make Your AI App Feel Fast

Struggling with Slow AI Responses: Building a Streaming Chat UI with SSE

I measured every millisecond of my real-time AI pipeline. The LLM was the fast…

I Made My Voice Agent Feel Faster by Streaming Sentences, Not Audio

Streaming an LLM response, in 4 GIFs

Streaming LLM responses to the browser in Go (Server-Sent Events)