Your agent dashboard says 90% success. It is wrong, and not because the math is sloppy. It is wrong because of which runs it forgot to count. Every run that timed out, got aborted, or is still stuck in RUNNING three hours later has quietly slipped out of the denominator. A run that FAILED is the honest one. It raised its hand, it sits in your error logs, it is already dragging the number down where it belongs. The run you should be scared of is the one that never came back to tell you anything.
That is survivorship bias, and it lives in almost every reliability number I have looked at.
TL;DR
A naive success rate divides wins by "runs that returned a clean pass or fail." That set excludes timed-out, aborted, and hung runs.
Excluded runs leave the denominator, so they inflate the rate by being invisible. The metric looks better the more runs disappear.







