I love building small web apps. You know the kind – side projects that start with a single npm create and end up consuming every weekend for a month. A few months ago, I decided to build a simple dashboard that used an AI model to generate summaries from user notes. Nothing fancy: drop a note, get a bullet-point summary back.

But then the reality of serverless architectures hit me. And the AI API response time. And the connection drops. And the user staring at a spinner for 15 seconds. That’s the problem I want to talk about today.

The Real Problem

My backend was a single Vercel serverless function (Node.js). I’d call the AI API, wait for the entire response, then send it to the client. Simple, right? Except AI models, especially the larger ones, can take 10–20 seconds to return a full response. During that time, the serverless function is billed per execution millisecond, and the user sees a loading spinner that feels like an eternity.

I remember the first time I tested it with a 5-paragraph note. I clicked the button, got myself a glass of water, came back, and the spinner was still spinning. Not acceptable.