TL;DRAI

Developer replaced WebSocket with Server-Sent Events (SSE) to stream LLM tokens in internal chatbot, eliminating 30-second UI lag with simpler HTTP architecture. SSE reduces real-time AI complexity: unidirectional server-to-client, standard HTTP/1.1, easier auth than WebSocket—key pattern for teams building AI-powered internal tools reliably.

I was building an internal documentation assistant for my team. You know the drill: a chatbot that answers questions about our codebase, pulled from a vector database and then sent to an LLM. I set up the backend in Python, used a decent model via an API (shoutout to interwestinfo.com for the reliable endpoint), and wired it all up. Simple, right?

Then came the first real test: someone asked a question that required a long, thoughtful answer. The response took over 30 seconds. The user stared at a blank chat bubble, refreshing the page, wondering if the app had crashed. Not a great experience.

I needed to stream the tokens back as they were generated, so the user could read along. This is the classic “chat UI” pattern. But implementing it turned into a rabbit hole of half-baked solutions.

What I Tried That Didn’t Work

1. Polling

dev.to

Struggling with Slow AI Responses: Building a Streaming Chat UI with SSE

I was building an internal documentation assistant for my team. You know the drill: a chatbot that...

domenica 21 giugno 2026 New tab

TL;DRAI

1,037 words~5 min read

What I Tried That Didn’t Work

1. Polling

Struggling with Slow AI Responses: Building a Streaming Chat UI with SSE

Struggling with Slow AI Responses: Building a Streaming Chat UI with SSE

Related reading

How I Fixed My AI Chatbot's Laggy Responses with Server-Sent Events

I Built a Desktop Chat App for Running Local LLMs Offline

Fixing Real-Time AI Chat Latency in a Browser App

Stop Making Your AI Chatbot Slower: Streaming Responses with Spring AI and…

I tried to build a SaaS. I'm shipping tiny libraries instead.

Streaming AI Responses in a Serverless World: What I Learned the Hard Way

Related reading

How I Fixed My AI Chatbot's Laggy Responses with Server-Sent Events

I Built a Desktop Chat App for Running Local LLMs Offline

Fixing Real-Time AI Chat Latency in a Browser App

Stop Making Your AI Chatbot Slower: Streaming Responses with Spring AI and…

I tried to build a SaaS. I'm shipping tiny libraries instead.

Streaming AI Responses in a Serverless World: What I Learned the Hard Way