The Missing Test Suite for AI Agent Memory

There's a strange gap in the AI agent stack. Prompts have LangSmith. RAG pipelines have Ragas. APIs have Postman. But memory, the thing that makes an agent remember who the user is, what they said, and what they want, has no testing tools at all.

This means most teams find out about memory failures from their users. A customer says "I already told you my name." A support ticket gets reopened because the agent asked for the account ID that was provided three messages ago. An agent recommends steak to someone who said they're vegan.

These are testable problems. They just haven't been tested because the tooling didn't exist.

I built memeval to fill this gap. It's an open-source framework that runs standardized test scenarios against any memory backend and tells you what passes, what fails, and why.