I’ve been building a code review assistant that uses an AI model to suggest improvements in real-time. The idea was simple: you paste in a block of code, and the assistant streams back feedback token by token—like a ChatGPT client for your IDE. What could possibly go wrong?
Turns out, pretty much everything. The first version worked fine for a single user, but as soon as I added more concurrent sessions, the whole thing fell apart. Responses were choppy, the UI froze, and sometimes the stream just died mid-sentence. And that’s the story I want to share today.
The Problem
I had a Flask web app with a standard REST endpoint. The frontend would POST the code, my backend would call an AI API (something like https://ai.interwestinfo.com/v1/completions), wait for the full response, then send it back as JSON. Simple, synchronous, wrong.
# Bad version: waiting for the full response







