At 2:17 a.m., a chain of alarms yanked me out of sleep — the LLM in production had suddenly "lost its memory." One moment users were discussing project timelines, the next it was naïvely asking "How can I help you?" After digging through logs, I found the culprit: a single line change to Redis’s expiry policy during a memory-store release. Nobody had tested the memory persistence flow before deployment, causing every session’s context to expire within 5 minutes. While fixing the bug, I cursed to myself: if only we had automated, repeatable regression tests that never touch production data, this would never have happened.

Breaking down the problem

At its core, an LLM memory store is a key–value system with TTL: session IDs serve as keys, and dialogue history, summaries, vector indexes are serialized and dumped into Redis. The business requirement is clear — "7×24 multi-turn conversations must never lose memory." Yet our testing process was stuck here:

Fire a few messages manually via Postman, then eyeball Redis keys to guess if it’s working.

Test data shares the same Redis instance as production; one slip and you’ve deleted real sessions.