I used to think the problem was the agent.

I would hand it a large JSON export and ask a reasonable question: what changed, what looks risky, what should we investigate before release?

It would find something. It always found something.

But it missed fields. It over-indexed on irrelevant values. It hallucinated patterns in JSON that weren't there. It noticed one dramatic-looking record and ignored the boring distribution that made the record meaningful. So I tried the usual fixes: stricter prompts, longer instructions, bigger context windows, more examples.

The real problem was simpler.