Author(s): Michael Neuberger
Originally published on Towards AI.
If you’ve shipped anything with an LLM in the loop, you know the shape of the bill. Turn one is cheap. Turn fifty is not. Turn five hundred is something you start architecting around. The conversation history grows, the prompt grows, every call ships the entire past back into the model, and the cost curve is — almost insultingly — linear in the thing you actually want more of: useful interaction.
The standard answers are familiar. Truncate. Summarize. Stuff the recent N turns and pretend turn 1 didn’t matter. Build a vector store on the side and hope retrieval picks the right chunk. Each of these works, sort of, until it doesn’t.
Semvec is a different bet: replace the unbounded conversation history with a fixed-size semantic state plus a tiered, content-aware memory. Turn 10 and turn 10,000 carry the same input footprint. The agent still has structured access to decisions, invariants, error patterns, and prior context across sessions — but it pays for that access with a constant, not a growing line item.













