At 2 AM, my boss sent three frantic messages in a row: “Users say the AI doesn’t remember what they discussed half an hour ago—is this a bug?” Still half‑asleep, I opened the monitoring dashboard and saw the service had been restarted for a rolling update three hours earlier. The conversation history that had been living in memory was gone—total amnesia. Even worse, right before the restart a user had spent 20 minutes explaining their business rules. Now the AI was acting like a brand‑new intern, asking “How can I help you?” That’s the ugly side of relying on in‑memory conversation memory alone.

Breaking down the problem

When you build an AI chat service with memory, the most common approach is to use LangChain’s ConversationBufferMemory and stuff the whole conversation history into the prompt. It works fine in dev, but the moment you enter production, cracks appear:

On restarts or scaling, the in‑memory history simply evaporates. Your users have to re‑explain everything seconds later.

Long conversations — customer support, legal consultations, code reviews — easily accumulate thousands of tokens. ConversationBufferWindowMemory only keeps the last N turns, so critical context often gets lost.