You don't actually care which model your agent runs on. You care that the thing you set up last month is still doing its job this morning — triaging the inbox, chasing the unpaid invoice, posting the standup summary — without you hovering over it. That's the entire promise of an autonomous agent: configure it once, then trust it to run.

So here's the uncomfortable pattern every operator eventually hits: the demo works flawlessly, and the week-two version quietly falls over. The agent that wowed you on Tuesday is silently stuck on Friday, and you only find out because the invoice didn't go out.

The instinct is to blame the model — "it got dumber," "I picked the wrong one." Almost always, that's wrong. Reliability at this layer is an operations problem, not a model problem. Here's the math that explains why.

The compounding trap

Agents don't do one thing. They do a chain of things: read an email, call an API, parse the result, decide, take an action, confirm. Each link in that chain has some probability of succeeding. And probabilities multiply.