A few weeks ago I started building SafeRun — inline reliability infrastructure for AI agents in production. The temptation, when you're building something in the agent reliability space, is to lead with validation. Block the bad action before it happens. Stop the runaway loop. Enforce the policy.

These are real features. SafeRun ships all of them. But they're not the first thing we built. The first thing we built was Replay.

Here's why.

The failure mode no one talks about

Most teams shipping AI agents into production discover the same problem after their first bad incident. The agent did something it shouldn't have. They go to investigate. And they find that they can't reproduce what happened.