I watched an LLM pipeline burn $400 in 90 minutes once. Not because the model was expensive, but because a single unhandled 429 rate-limit error triggered an infinite retry loop against GPT-4. No fallback. No circuit breaker. No cost alert. Just a runaway process that kept hammering the API until the billing dashboard lit up.

That was early in my job board platform work, where I was processing 10,000+ job listings daily through an LLM scoring pipeline. The system worked great in testing. In production, it found every edge case the API could throw at it.

Here's what I learned about making AI agents actually reliable.

The Retry Pattern That Doesn't Burn Money

Most retry logic I see in production code is naive. A try-catch wrapper with a fixed delay and a prayer. That works until you hit a sustained outage and every retry fires at the same interval, creating a thundering herd against an already struggling API.