A major AI memory provider published their own research this spring measuring how well their system actually works in production. The controlled benchmark result was impressive: over ninety percent accuracy on standard evaluation corpora. The production result at thirty days was forty-nine percent.

That gap -- ninety-one to forty-nine -- is worth sitting with for a moment. The same system. The same vendor. The same definition of "working." The difference is what happens when the system runs continuously against real workloads instead of curated test sets.

This is not a vendor failing to disclose their results. They published the data themselves, in a public research report. That transparency is worth acknowledging. But the gap also tells you something important about what "AI agent memory" is actually solving -- and what it is not.

Why Auto-Capture Memory Degrades

The core challenge with automatic memory accumulation is that agents do not save discrete, well-structured facts. They save inferences, summaries, and working conclusions -- and those accumulate in ways that eventually contradict each other.