This is not proof. It is early, messy evidence from my own workflow: three failures, one small comparison, and one schema bug I missed.
I'd spent a week arguing, in public, that AI memory should be built on discipline before infrastructure: preserve corrections, preserve unresolved questions, decide which record wins. The framework has three layers: summary memory for continuity, correction memory for repeated mistakes, and unresolved memory for questions that should not be settled yet.
Good theory.
Then three days of unplanned failure tested every claim without asking my permission — and then one comment forced me to test it more carefully.
Here's what held, what the numbers actually said, and the flaw I didn't see coming.








