You're paying for every token your agent burns. And according to new research from Northwestern, Stanford, Cornell, and All Hands AI, a large share of that spend goes directly to waste — on trajectories the agent was never going to complete successfully.
The paper is BAGEN: Are LLM Agents Budget-Aware? (arXiv:2606.00198, submitted May 29, 2026). Its core question is simple: can frontier LLM agents predict when they're about to run out of runway? The answer, across five frontier models and four environments, is a firm no.
This article covers what BAGEN found, how the concept of budget-aware interval estimation works, and includes an Effloow Lab PoC that reproduces the key dynamics using Python stdlib — no API keys, no GPU.
Why This Matters for Production Agent Systems
Token budgets are a real constraint in every deployed agent system. You set a max_tokens limit, you watch the cost dashboard, and you assume the agent will either finish the task or hit the hard wall. What BAGEN documents is a third case that developers rarely account for: the agent continues consuming tokens on a task it cannot complete, all the way to the limit.







