The interesting question about coding agents in 2026 is not whether they work. It is which patterns hold up once you point them at code that has consequences. After roughly eighteen months of running Claude, Codex, and a rotating cast of free-tier models against a real equity research stack at Leviathan, a small set of patterns keep paying for themselves. The rest get pruned within a week.

This note is a field log, not a tutorial. The frame is engineering, not capability. The question I keep asking is: what survives contact with production?

Context is the budget, not the prompt

The mental model that broke first was treating context as a free resource. A 1M token window does not mean you can stream 1M tokens of garbage into the model. It means you have a budget. Every token of tool output, every diff, every retrieved chunk is a withdrawal from a fund that determines how much reasoning the model can do downstream.

The pattern that holds up is structured compression at the tool boundary. When a tool returns 50KB of JSON, do not pass 50KB into the model. Pass a deterministic summary built from the JSON: counts, top-K items, the specific fields the downstream step needs. Keep the raw blob on disk under a handle. If the model needs more, it can ask.