Benchmarking a kill switch for runaway AI agents -- and why the real number is a ceiling, not a %

Claims about AI cost control are cheap. "Cut your agent spend by 60%!" is on every landing page. So instead of a claim, here's a benchmark you can run yourself in one command -- and an honest reading of what its number actually means, because the headline percentage is the least interesting part.

The short version: I ran the same looping agent twice -- once unguarded, once behind a hard dollar budget -- against a deterministic provider, and measured the spend. Then I'll show you why the "% saved" framing undersells it, and why a flat ceiling is the number that matters.

The setup

I wrote about why a runaway agent slips past logging, monitoring, and max_tokens: it's not one anomalous call, it's a thousand individually-valid ones, and the only thing that stops it is a deterministic, pre-call, per-run limit. This benchmark measures exactly that limit doing its job.

The harness (benchmark/ in the repo) is built so the only variable between the two runs is whether the budget fired:

The setup

The harness (benchmark/ in the repo) is built so the only variable between the two runs is whether the budget fired:

Benchmarking a kill switch for runaway AI agents -- and why the real number is a ceiling, not a %

Benchmarking a kill switch for runaway AI agents -- and why the real number is a ceiling, not a %

Related reading

How to Close the AI Agent Cost Gap at the Call Site

How I Slashed My AI API Bill by 92% in 2026 — A Cost Optimizer's Speed…

AI Agent Cost Is a Runtime Signal | Focused Labs

Observability told me exactly how much money my agents wasted. I wanted…

We burned 136 million tokens running an autonomous agent studio. Here's how we…

Your AI agent reports 80% task completion. It fabricated it.

Related reading

How to Close the AI Agent Cost Gap at the Call Site

How I Slashed My AI API Bill by 92% in 2026 — A Cost Optimizer's Speed…

AI Agent Cost Is a Runtime Signal | Focused Labs

Observability told me exactly how much money my agents wasted. I wanted…

We burned 136 million tokens running an autonomous agent studio. Here's how we…

Your AI agent reports 80% task completion. It fabricated it.