LLM Memory System Pitfalls: A 3-Hour Bug Hunt Solved with Pytest Snapshot Testing

It was 2 a.m. when the alert call jolted me awake — our production Agent had suffered “amnesia” for three consecutive conversations. The context the user had carefully built was gone, and complaints were flooding in. Squinting at the logs, I discovered that the rollback method in the memory management module had been broken by an innocuous-looking code refactor. Not only did the rollback undo the erroneous operation, it also wiped out the entire conversation history. Worse still, our existing unit tests never caught the bug: they always started from a fresh empty database and could never cover a cross-session scenario like “roll back dirty data to a previous snapshot.” I spent three hours debugging, manually simulating intermediate states, before I finally pinpointed the root cause. That’s when it hit me: we weren't lacking tests — we were missing snapshot tests that capture the entire “memory state.”

Problem Breakdown

Our LLM memory system uses SQLite for local persistence. Each session owns a table that stores conversation turns, vector summaries, and tool-call records. Two critical operations are:

save_snapshot(session_id): serializes the full state of a session into the snapshots table, creating a rollback checkpoint.

LLM Memory System Pitfalls: A 3-Hour Bug Hunt Solved with Pytest Snapshot Testing

Related reading

vLLM V0 to V1: Correctness Before Corrections in RL

SSH died. Spent 3 hours fixing the wrong thing.

Ashes: Mitchell Starc induces four overs of chaos at start of second Test

The Ashes 2025-26: Snicko controversy continues with Jamie Smith dismissal

Duckett and Bethell were dangled out to dry by failings of a slack setup |…

Mitchell backs Hawthorn’s medical team after concussion test for ‘the Wizard’