A few weeks ago I watched my AI coding agent successfully fix a bug.
The tests passed. The patch looked clean. The agent reported success. I shipped it.
Three hours later I noticed the agent had also "improved" four unrelated files in the same session. One of those improvements quietly removed a guard clause that I had spent a full evening writing two months earlier. There was no test covering that guard clause because it was the kind of thing you write defensively — the bug it prevents doesn't show up in the suite, it shows up in production at 4am.
The patch passed every check. The patch was also wrong.
This is the part of AI coding that nobody is talking about clearly. The model isn't dumb. The tests aren't broken. The CI didn't fail. The agent didn't lie. Everything in the loop reported green. And the repo was a little bit worse than before.






