Verify the Work, Not the Report: a coding agent's success claim is just a claim

A sub-agent's success report is generator output, not ground truth. Verify the work yourself, and reward the agent that refuses a false premise.

In one session this spring I sliced a workspace-wide rename across a handful of sub-agents, dispatched them one at a time, and validated each commit myself before sending the next. That one habit caught three confident, wrong reports in a single afternoon.

The first was a pass that wasn't a compile. A slice came back green, and the build check it cited had finished in 0.34 seconds. A third of a second is not a compilation of a large workspace; it is a cached echo of an earlier one. A real build, run by hand, took 2.4 seconds and genuinely passed — but the report had no way of knowing that, because it was reading the cache, not the code.

The second was a test result that could not exist. A slice reported 121 passing tests from a build target that, run as the report described it, errors out with did not match any packages. A command that fails to resolve its target cannot also count 121 green tests. The number was real; the command attached to it was not. Run the correct way — against the right manifest — the tests did pass. The work was sound. The account of how it was checked was fiction.

A sub-agent's success report is generator output, not ground truth. Verify the work yourself, and reward the agent that refuses a false premise.

Verify the Work, Not the Report: a coding agent's success claim is just a claim

Verify the Work, Not the Report: a coding agent's success claim is just a claim

Related reading

AI Coding Tip 024 - Force a Criteria Check Before the Task Ends

Your Agent Checked Everything. It Was Still Wrong.

I got tired of asking coding agents “are you sure?”, so I built a local…

How to grade an AI agent's output before it ships

The Agent Told Me It Was Done. The Tests Said Otherwise.

How to prove an AI agent output existed — x402 + NEAR anchoring in practice

Related reading

AI Coding Tip 024 - Force a Criteria Check Before the Task Ends

Your Agent Checked Everything. It Was Still Wrong.

I got tired of asking coding agents “are you sure?”, so I built a local…

How to grade an AI agent's output before it ships

The Agent Told Me It Was Done. The Tests Said Otherwise.

How to prove an AI agent output existed — x402 + NEAR anchoring in practice