Constant-Cost Persistent Semantic State Memory Engine for LLM Agents

Author(s): Michael Neuberger

Originally published on Towards AI.

If you’ve shipped anything with an LLM in the loop, you know the shape of the bill. Turn one is cheap. Turn fifty is not. Turn five hundred is something you start architecting around. The conversation history grows, the prompt grows, every call ships the entire past back into the model, and the cost curve is — almost insultingly — linear in the thing you actually want more of: useful interaction.

The standard answers are familiar. Truncate. Summarize. Stuff the recent N turns and pretend turn 1 didn’t matter. Build a vector store on the side and hope retrieval picks the right chunk. Each of these works, sort of, until it doesn’t.

Semvec is a different bet: replace the unbounded conversation history with a fixed-size semantic state plus a tiered, content-aware memory. Turn 10 and turn 10,000 carry the same input footprint. The agent still has structured access to decisions, invariants, error patterns, and prior context across sessions — but it pays for that access with a constant, not a growing line item.

Author(s): Michael Neuberger

Originally published on Towards AI.

Constant-Cost Persistent Semantic State Memory Engine for LLM Agents | Towards AI

Constant-Cost Persistent Semantic State Memory Engine for LLM Agents | Towards AI

Other newsrooms on this story

Related reading

Four agentic AI memory systems for smarter LLMs

LLM agent memory at 0.12% of model parameters

How procedural memory can cut the cost and complexity of AI agents

Moving Beyond the Context Window: The Agentic Memory Architecture

The LLM is an ALU

Stop Your LLMs from Forgetting: How a 2016 String Algorithm Solves AI's Biggest…