Don't Make the Agent Re-Run the Test Suite to Find the Failure

Here is a small failure mode that cost me time for longer than it should have.

The agent would run the test suite. Tests would fail. The agent would announce that tests failed. Then, when it needed to know which tests failed, would either guess, ask, or run the suite a second time to scrape the output. Sometimes it would run it a third time. The failing tests were right there in the output of the first run; the agent didn't read them carefully enough to remember.

This is not the model being lazy. It is the model being honest about its attention. Test output is voluminous, the failure summary is buried at the bottom, and by the time the agent is reasoning about the next step ("which test failed, what assertion, what file"), the relevant lines have scrolled past whatever the agent actually retained. The cheapest way to get the information back is to run the command again.

Running the command again is the wrong answer. A full suite takes minutes. Doing it twice to learn something the first run already told you is a tax on every red-test loop, and the loops compound.

The fix: tee everything to a gitignored directory

Here is a small failure mode that cost me time for longer than it should have.

Running the command again is the wrong answer. A full suite takes minutes. Doing it twice to learn something the first run already told you is a tax on every red-test loop, and the loops compound.

The fix: tee everything to a gitignored directory

Don't Make the Agent Re-Run the Test Suite to Find the Failure

Don't Make the Agent Re-Run the Test Suite to Find the Failure

Related reading

Why Your Test Suite Starts Failing Six Months Later, and What to Do About It

AI Agent Debugging Checklist: From Failed Run to Root Cause

Your Agent Failed in Prod. Good Luck Reproducing It.

Why your agentic system doesn't survive its own success — and what a…

Stop Asserting Equality: How to Test Agents When Every Run Is Different

Why Your AI Agent Works in Dev and Breaks in Prod

Related reading

Why Your Test Suite Starts Failing Six Months Later, and What to Do About It

AI Agent Debugging Checklist: From Failed Run to Root Cause

Your Agent Failed in Prod. Good Luck Reproducing It.

Why your agentic system doesn't survive its own success — and what a…

Stop Asserting Equality: How to Test Agents When Every Run Is Different

Why Your AI Agent Works in Dev and Breaks in Prod