I lean on AI coding agents hard. Claude Code, Cursor, Codex — I drive them fast to ship fast. That's not a confession, it's the whole reason this project exists. If you push these tools to their limits every day, you stop seeing them as magic and start seeing exactly where they break.

And the thing I kept noticing is this: they never break in the chat.

In the conversation, the agent looks great. It explains its plan, it sounds reasonable, it agrees with all your constraints. The problem shows up later — in the diff, after the fact, when you're tired and the PR is green and you just want to merge.

What actually goes wrong

A short, real list of things I watched coding agents do, none of which looked wrong in the chat: