Two months ago, I shipped a customer support chatbot for my SaaS product. It worked great for the first three messages. Then it started forgetting what the user said earlier, repeating itself, and giving contradictory advice. Users noticed. One wrote: "Your bot has the memory of a goldfish."

I had hit the classic LLM context window wall. My initial implementation just stuffed the entire conversation history into the prompt. That worked until conversations grew beyond 4k tokens. Then I tried truncation, but that lost critical context. The problem felt unsolvable without either breaking the bank on bigger context windows or losing information.

Here's what I tried, what failed, and the approach that finally let my chatbot hold coherent multi-turn conversations without blowing up my API costs.

The naive approach: just keep adding messages

My first attempt was embarrassingly simple: