Your Agent Success Rate Counts Only the Survivors

Your agent dashboard says 90% success. It is wrong, and not because the math is sloppy. It is wrong because of which runs it forgot to count. Every run that timed out, got aborted, or is still stuck in RUNNING three hours later has quietly slipped out of the denominator. A run that FAILED is the honest one. It raised its hand, it sits in your error logs, it is already dragging the number down where it belongs. The run you should be scared of is the one that never came back to tell you anything.

That is survivorship bias, and it lives in almost every reliability number I have looked at.

TL;DR

A naive success rate divides wins by "runs that returned a clean pass or fail." That set excludes timed-out, aborted, and hung runs.

Excluded runs leave the denominator, so they inflate the rate by being invisible. The metric looks better the more runs disappear.

That is survivorship bias, and it lives in almost every reliability number I have looked at.

TL;DR

A naive success rate divides wins by "runs that returned a clean pass or fail." That set excludes timed-out, aborted, and hung runs.

Excluded runs leave the denominator, so they inflate the rate by being invisible. The metric looks better the more runs disappear.

Your Agent Success Rate Counts Only the Survivors

Your Agent Success Rate Counts Only the Survivors

Other newsrooms on this story

Related reading

A 58% Win-Rate Over Zero Closed Trades: Recompute Agent Scorecard

Your AI agent calls the wrong tool — and your JSON schema is usually why

Your AI agent reports 80% task completion. It fabricated it.

Your Agent's Retries Are Double-Charging Your Users (and Every Eval Is Green)

Your Failed Agent Run Burns Most of Its Tokens AFTER It Fails — Measure It in…

Ten 95% Reliable Agents Chained Together Give You a 60% System. Microservices…

Related reading

A 58% Win-Rate Over Zero Closed Trades: Recompute Agent Scorecard

Your AI agent calls the wrong tool — and your JSON schema is usually why

Your AI agent reports 80% task completion. It fabricated it.

Your Agent's Retries Are Double-Charging Your Users (and Every Eval Is Green)

Your Failed Agent Run Burns Most of Its Tokens AFTER It Fails — Measure It in…

Ten 95% Reliable Agents Chained Together Give You a 60% System. Microservices…

Other newsrooms on this story