I maintain mnemo, an MCP-native embedded memory database for agents. Its read path is retrieval: hybrid search (vector + BM25 + graph + recency) fused with RRF. This week two papers argued that retrieval-from-a-bank is the wrong default for long-horizon agents. Here is how I'm reading them as the person whose product is implicated.
The two papers
Mem-π (ServiceNow + Mila, arXiv:2605.21463) trains a separate model to generate guidance on demand instead of retrieving static entries. It decides when to emit guidance and what to emit, and it can abstain. Result: >30% relative improvement on web-navigation tasks over retrieval-based and prior RL memory baselines.
MINTEval (UNC, arXiv:2605.18565, code) benchmarks memory under interference: facts get revised and contradicted across contexts up to 1.8M tokens. Across 7 systems (long-context, RAG, memory frameworks): 27.9% average accuracy, worst on multi-target aggregation. Diagnosis: the bottleneck is retrieval + memory construction, and it gets worse as updates pile up.
What they get right














