Load late, load little: just-in-time context for conversation history

Most agents drag their entire past into every turn. A better default: keep a thin index of what was said hot, and fetch only the few turns you actually need — intact, on demand.

Code: github.com/NirajPandey05/jit_context

There is a quiet assumption baked into how most agents handle memory: that more context is safer than less. If the model might need something, put it in the window. The conversation grows, every prior turn rides along on every new request, and we trust the model to find the part that matters.

That assumption breaks twice. It breaks on cost, because an agent loop re-sends its whole window on every step — a hundred stale turns aren't paid for once, they're paid for on turn 101, 102, and every step after. And it breaks on quality, because models don't read a long window evenly. Relevant facts buried in the middle get underweighted; irrelevant bulk competes for attention with the thing that actually answers the question. Past a point, a bigger context produces a worse answer, not just a costlier one.

So the interesting question isn't "how do we fit more in?" It's "how do we keep the window small and dense without losing the one old turn that matters?" This post is the design we built around that question — for the specific case of long conversation history — plus the benchmark we used to keep ourselves honest.

Most agents drag their entire past into every turn. A better default: keep a thin index of what was said hot, and fetch only the few turns you actually need — intact, on demand.

Code: github.com/NirajPandey05/jit_context

Load late, load little: just-in-time context for conversation history

Load late, load little: just-in-time context for conversation history

Other newsrooms on this story

Related reading

Context Compaction for AI Agents: A Complete Guide

Your context window is not your agent's memory

Moving Beyond the Context Window: The Agentic Memory Architecture

Context window in AI: why every token is a budget decision

Cosmic as Agent Memory: Structured, Versioned, and Queryable

Context Windows Are Not Memory: What AI Agent Developers Need to Understand -…

Other newsrooms on this story

Related reading

Context Compaction for AI Agents: A Complete Guide

Your context window is not your agent's memory

Moving Beyond the Context Window: The Agentic Memory Architecture

Context window in AI: why every token is a budget decision

Cosmic as Agent Memory: Structured, Versioned, and Queryable

Context Windows Are Not Memory: What AI Agent Developers Need to Understand -…