I spent a week fixing my chatbot's memory — here's what worked

Two months ago, I shipped a customer support chatbot for my SaaS product. It worked great for the first three messages. Then it started forgetting what the user said earlier, repeating itself, and giving contradictory advice. Users noticed. One wrote: "Your bot has the memory of a goldfish."

I had hit the classic LLM context window wall. My initial implementation just stuffed the entire conversation history into the prompt. That worked until conversations grew beyond 4k tokens. Then I tried truncation, but that lost critical context. The problem felt unsolvable without either breaking the bank on bigger context windows or losing information.

Here's what I tried, what failed, and the approach that finally let my chatbot hold coherent multi-turn conversations without blowing up my API costs.

The naive approach: just keep adding messages

My first attempt was embarrassingly simple:

I spent a week fixing my chatbot's memory — here's what worked

Related reading

Why my AI chatbot kept forgetting things (and how I fixed it)

My Support Bot Kept Making Stuff Up — Here's How I Fixed It

10x Faster LLM Memory Testing: From Manual Verification to Pytest Automation

How I Fixed My AI Chatbot's Timeout Nightmare

Building MemBot AI: Creating a Customer Support Assistant with Persistent Memory

Why I Stopped Using Chat History and Used Hindsight Memory