How to stop an AI agent from burning $47,000 in a loop nobody noticed.

A multi-agent research system sat in production for eleven days doing exactly what it was built to do. Four agents, LangChain-style, coordinating over A2A to pull market data and summarize it. Every health check passed the whole time. No crash, no 500, no timeout... from the outside the system was perfectly healthy.

Two of the four agents had quietly locked into a recursive loop, passing clarification requests and verification instructions back and forth, thousands of times, around the clock. Nobody noticed because nothing was technically wrong. The thing that finally caught it was a person opening the invoice and asking why the number was so high. The number was $47,000.

That story has been making the rounds because it's so relatable, but it isn't a one-off. Uber said it burned through its entire 2026 AI coding budget in four months. One company reportedly ran up a $500M Claude bill after rolling out access with no usage caps. The FinOps Foundation said that around April, the conversation across the industry flipped from "go fast" to "we need guardrails, how do we control this." This is the dominant operational failure mode in production agents right now, and it's barely about the model at all.

How to stop an AI agent from burning $47,000 in a loop nobody noticed.

How to stop an AI agent from burning $47,000 in a loop nobody noticed.

Related reading

Why your AI agent loops forever (and how to break the cycle)

Stop Debugging in the Dark: How to Build a Real-Time Control Room for…

Your AI Agent Just Burned $108 in an Hour. Here's the 50-Line Fix.

Debugging multi-agent AI: When the failure is in the space between agents

I Cron-Scheduled 7 AI Agents. 2 Silently Failed for 18 Days. Tracing Wouldn't…

AI Agents in Production: Error Handling, Fallbacks, and Cost Control

Related reading

Why your AI agent loops forever (and how to break the cycle)

Stop Debugging in the Dark: How to Build a Real-Time Control Room for…

Your AI Agent Just Burned $108 in an Hour. Here's the 50-Line Fix.

Debugging multi-agent AI: When the failure is in the space between agents

I Cron-Scheduled 7 AI Agents. 2 Silently Failed for 18 Days. Tracing Wouldn't…

AI Agents in Production: Error Handling, Fallbacks, and Cost Control