TL;DR: WebSockets are the right protocol for production AI chat. But the connection is stateless at the session level. When it drops — AWS ALB defaults to 60 seconds, Cloudflare to 100 seconds on Free and Pro plans — all in-flight tokens, tool call results, and agent context disappear. Reconnection logic restores the socket. It doesn't restore the session. That's the gap this post covers.
WebSockets are the right protocol for production AI chat. But that fact doesn't prevent the failure most teams hit first. An enterprise load balancer closes the idle connection at 60 seconds during a tool execution wait. Your reconnect logic fires in under a second, the agent keeps running server-side, and the client receives nothing from the gap. No tokens, no tool call results, no context.
The reconnected socket has no view of what happened while it was down. Three conditions cause this routinely: a proxy timeout mid-task, a page reload mid-generation, and a mobile network handoff. Each breaks for the same underlying reason: the WebSocket protocol handles transport, not session state, and reconnection logic doesn't change that.
Key takeaways
WebSockets are the right protocol for production AI chat: bidirectional, persistent, and suited to live steering and tool calls in ways SSE isn't.









