TL;DRAI

GAIA 2026 study on 165 multi-agent traces: 58.1% of tokens in failed runs burn AFTER the first error. waste_probe.py (40 lines) measures offline token waste. For agentic-system builders, measuring tail cost post-failure is critical—Dynamic Workflows (May 2026) multiply burn via fan-out to 1000+ subagents per run.

Wasted tokens after agent failure are the part nobody meters. A clean agent run and a failed one cost about the same to start; the bill diverges after the run is already lost. This post measures that tail — the token fraction your run keeps burning past its first failure signal — with a 40-line offline meter.

Here's the number that made me write this. In a 2026 paper on multi-agent observability, researchers measured 165 GAIA traces and found that among warned failed runs, 58.1% of tokens are spent after the first warning signal, on average. First the warning fires (a tool error, a loop, a budget-pressure flag), and then the agent keeps going for more than half the run's tokens before it stops. Read the citation carefully: that 58.1% is their number, on warned failed runs specifically, not all runs and not my measurement. I'll keep those separated all the way down.

The point I want to land: waste is not failure. The failure is the cheap part. What's expensive is the distance between "this run is clearly off the rails" and "the agent actually stopped." That gap is denominated in tokens, and you can measure it on your own logs in about a minute.

TL;DR

A failed agent run spends most of its tokens after the first detectable failure signal — the published figure is 58.1% on warned failed runs (arXiv 2606.01365, 165 GAIA traces).

dev.to

Your Failed Agent Run Burns Most of Its Tokens AFTER It Fails — Measure It in 40 Lines

Wasted tokens after agent failure are the tail, not the failure: a 40-line offline keyless meter for the token fraction burned after the first signal.

giovedì 18 giugno 2026 New tab

TL;DRAI

2,157 words~10 min read

TL;DR

A failed agent run spends most of its tokens after the first detectable failure signal — the published figure is 58.1% on warned failed runs (arXiv 2606.01365, 165 GAIA traces).

Your Failed Agent Run Burns Most of Its Tokens AFTER It Fails — Measure It in 40 Lines

Your Failed Agent Run Burns Most of Its Tokens AFTER It Fails — Measure It in 40 Lines

Related reading

Five ways your AI coding agent wastes tokens (and how to fix each one)

Your AI Agent Is Burning Tokens. Do You Know How Many?

BAGEN: LLM Agents Waste 44% of Tokens on Tasks They'll Fail

I Turned on Agent Tracing for 30 Days. 4 Hidden Bottlenecks Were Eating 47% of…

Your AI Agent Just Burned $108 in an Hour. Here's the 50-Line Fix.

Why the retry loop is usually the expensive part of agent work

Related reading

Five ways your AI coding agent wastes tokens (and how to fix each one)

Your AI Agent Is Burning Tokens. Do You Know How Many?

BAGEN: LLM Agents Waste 44% of Tokens on Tasks They'll Fail

I Turned on Agent Tracing for 30 Days. 4 Hidden Bottlenecks Were Eating 47% of…

Your AI Agent Just Burned $108 in an Hour. Here's the 50-Line Fix.

Why the retry loop is usually the expensive part of agent work