In 2025, a research nonprofit called METR ran a careful experiment. They took 16 experienced open-source developers, gave them 246 real tasks on codebases they'd worked in for years, and randomly let them use AI tools on some tasks and not others. Then they timed everything.

The developers expected AI to make them about 24% faster. After the study, they reported feeling about 20% faster.

They were actually 19% slower.

Read that again, because it's the whole problem in three numbers. The people doing the work were confident AI sped them up. The stopwatch said the opposite. And if those developers couldn't trust their own gut about whether AI was helping, your engineering org definitely can't trust a vibe in a planning meeting either.

So how do you actually tell? Not "does AI feel productive," because anyone will say yes, but "is this thing making the team ship better software faster, or just generating more motion?" That's a measurement question, and most of the ways people answer it are wrong. Let's fix that.