Most "streaming" LLM chatbots stream just the text. The model says "I'll search for that…" and then you wait 6 seconds while the tokens dribble in. The actual search? Hidden. The 3 scrapes it did to fact-check? Hidden. You're staring at a typing indicator that doesn't tell you anything about what's actually taking time.

I just built a chatbot where every tool call surfaces as a step in real time — 🔍 search_engine, 📄 scrape_as_markdown, 📄 scrape_as_markdown — while the response streams token by token afterwards. The user sees the agent's chain-of-thought as it happens, not as a postmortem.

The trick is that you have to stream three different things, and each layer needs to know what to do with each kind of event. Here's the architecture.

The shape of the stream

The agent runner (in my case, fi-runner wrapping the Claude Agent SDK) emits events of three types as they happen: