The agent that fixes bugs by running the code

I want to tell you about the first time I watched a coding agent fix a bug correctly on the first try.

It wasn't the model. The model was the same one I'd been using for months. It was the loop around the model that had changed.

The agent didn't open three files, skim them, write a confident theory in its head, and edit something that "looked like" the cause. That was the old loop. The one where you run the code, see the same bug, and watch the agent shrug and try another file.

This time the agent did something different. It listed five hypotheses. Picked one. Added some logging. Then it stopped and asked me to reproduce the failure. I clicked the button. The bug fired. The agent read the logs, confirmed the hypothesis, wrote a two-line fix, and pulled its logging back out.

The whole thing took maybe four minutes.

The agent that fixes bugs by running the code

Related reading

Why AI Coding Agents Need Work Attempts, Leases, and Checkpoints

Why I built an AI repair loop that stops after one fix on purpose

How I built mechanical enforcement for AI coding agents — and why prompts…

Your AI coding agent keeps re-trying the fix that already failed. I built a…

The bug that never crashed: how I fuzzed an AI's own code sandbox and found it…

"Loop Engineering: How I Stopped My AI Agent From Reward-Hacking Its Own…