I used to think the problem was the agent.
I would hand it a large JSON export and ask a reasonable question: what changed, what looks risky, what should we investigate before release?
It would find something. It always found something.
But it missed fields. It over-indexed on irrelevant values. It hallucinated patterns in JSON that weren't there. It noticed one dramatic-looking record and ignored the boring distribution that made the record meaningful. So I tried the usual fixes: stricter prompts, longer instructions, bigger context windows, more examples.
The real problem was simpler.






