You ask Claude to review your code. It says "looks good, clean, well factored". Of course it does. It wrote that code five minutes ago. You just asked the author to grade his own paper, and he gave himself an A.
Having an AI review code works. But not by asking the one who just wrote it. Quality doesn't come from a smarter model, it comes from an architecture where no role checks itself.
The self-preference bias
This isn't a hunch, it's measured. A model evaluating its own output rates it higher than others' at equal quality: the self-preference bias, documented by Panickssery and co-authors in 2024, and it's causal, not correlational. The model recognizes its own style and prefers it.
In practice that means the naive loop "write, then review what you just wrote" is broken by construction. You don't get a review, you get a justification. The agent already decided its code was good the moment it produced it; asking again only confirms.






