We’ve been trying to figure out a real answer to a question that keeps coming up: how do you measure whether someone is actually good at Claude Code, Codex, and the other AI coding tools? Not "do they use them," but how good are they at using AI.

The first metric we looked at, like everyone else, was token usage. It’s the only number you can pull out of the box. Anthropic and OpenAI hand you token data in the console. So token usage becomes an easy first answer.

But obviously counting tokens sucks as a metric.

What we noticed when we looked at the actual sessions

When we started reading session logs from people who were clearly good with these tools, and people who were clearly struggling, both groups burned tokens. Sometimes the strugglers burned more.