Claude Code and similar coding agents are strongest when the task requires judgment: architecture, unclear requirements, production incidents, security-sensitive logic, and hard debugging.

The problem is that many teams also use the same strongest model for routine work: small tests, README edits, type fixes, lint cleanup, examples, wrappers, and narrow bugs.

At small scale this feels fine. Once several people use coding agents every day, the workflow becomes harder to operate.

The issue is not only model price. It is broken context, shared keys, unclear limits, and no clean way to answer basic questions:

Which project used the most budget?