I have a production Claude agent that has been running for about four months. It does code review on incoming PRs, drafts changelog entries, and occasionally summarizes a Slack channel. Nothing exotic. Nothing the marketing pages would put on a banner.
It was burning 5.2 million tokens a month. I knew that because Anthropic's invoice told me. What the invoice did not tell me was where the tokens were going. The agent's logs said "PR-1234 reviewed in 3 turns, 14k tokens." That math should not add up to 5.2M unless the agent is reviewing roughly 370 PRs a month. The team ships about 80 PRs a month.
So I turned on per-call tracing for 30 days. By the end of the month I had found four bottlenecks the existing logs were structurally unable to surface. Together they were eating 47% of the monthly token bill while contributing zero new behavior. Fixing them cut the bill in half without changing what the agent does.
This post is the four bottlenecks, the trace query that found each one, and the fix.
What I instrumented








