Most agents drag their entire past into every turn. A better default: keep a thin index of what was said hot, and fetch only the few turns you actually need — intact, on demand.
Code: github.com/NirajPandey05/jit_context
There is a quiet assumption baked into how most agents handle memory: that more context is safer than less. If the model might need something, put it in the window. The conversation grows, every prior turn rides along on every new request, and we trust the model to find the part that matters.
That assumption breaks twice. It breaks on cost, because an agent loop re-sends its whole window on every step — a hundred stale turns aren't paid for once, they're paid for on turn 101, 102, and every step after. And it breaks on quality, because models don't read a long window evenly. Relevant facts buried in the middle get underweighted; irrelevant bulk competes for attention with the thing that actually answers the question. Past a point, a bigger context produces a worse answer, not just a costlier one.
So the interesting question isn't "how do we fit more in?" It's "how do we keep the window small and dense without losing the one old turn that matters?" This post is the design we built around that question — for the specific case of long conversation history — plus the benchmark we used to keep ourselves honest.








