You watch Claude Code analyze your repository. Files flash by. Symbols get resolved. It's working.
But how well is it working?
Here's a thought: we measure AI coding agents on the wrong metric.
We ask: "Did it complete the task?" But the better question is: "Did it stay focused while solving it?"
An agent that bounces between unrelated files three times, re-reads the same code, and loses semantic context is inefficient—even if it eventually gets the answer right. And we have no way to measure that.






