Wasted tokens after agent failure are the part nobody meters. A clean agent run and a failed one cost about the same to start; the bill diverges after the run is already lost. This post measures that tail — the token fraction your run keeps burning past its first failure signal — with a 40-line offline meter.

Here's the number that made me write this. In a 2026 paper on multi-agent observability, researchers measured 165 GAIA traces and found that among warned failed runs, 58.1% of tokens are spent after the first warning signal, on average. First the warning fires (a tool error, a loop, a budget-pressure flag), and then the agent keeps going for more than half the run's tokens before it stops. Read the citation carefully: that 58.1% is their number, on warned failed runs specifically, not all runs and not my measurement. I'll keep those separated all the way down.

The point I want to land: waste is not failure. The failure is the cheap part. What's expensive is the distance between "this run is clearly off the rails" and "the agent actually stopped." That gap is denominated in tokens, and you can measure it on your own logs in about a minute.

TL;DR

A failed agent run spends most of its tokens after the first detectable failure signal — the published figure is 58.1% on warned failed runs (arXiv 2606.01365, 165 GAIA traces).