AI agents now write enormous amounts of code, and it usually looks right. It compiles, it passes the tests, it reads cleanly in review. But "looks right" and "does what it said it would" are different things — and the gap between them is where the real bugs hide now.

I keep seeing the same failure mode: a pull request claims one thing, and the diff quietly does another. The reviewer reads the claim, confirms the claim is present, and sails right past the part nobody mentioned. At PR volume — especially with agents generating the code — humans simply can't audit every change for

Silent scope — the change does more than it claims

The most dangerous one, because the extra behavior hides behind a true, narrow description.

import { logger } from "../log";