In 2025, METR — an AI safety and capability research organization — ran a rigorous randomized controlled trial. Sixteen experienced open-source developers worked on 246 real-world tasks, each randomly assigned to either use AI coding tools freely or not at all.

The result was counterintuitive: developers using AI tools were 19% slower on complex tasks.

Before the study, those same developers predicted AI would make them 24% faster. After completing the experiment — still believing they had gone faster — their subjective confidence remained completely unshaken.

The finding did not make headlines for the reason people assumed. The headline was not "AI is useless." The headline was this: the bottleneck is not model quality. It is context quality.

The developers who slowed down were spending significant time on what researchers call "verification overhead" and "workflow friction" — the effort required to correct AI output that did not understand the architectural constraints, naming conventions, existing utility functions, and established patterns of the codebase they were working in. The AI was generating code. It was generating code for an imaginary system.