There's a specific loop that anyone who codes with AI agents knows by heart.

The agent tries a fix. The test fails. It tries something else. Fails again. Then — a few prompts later, or after its context window gets compacted — it confidently re-applies the exact first fix that already failed.

The failure got erased from its memory. The confidence didn't.

I watched Claude Code do this four times in one session, each time presenting the same broken patch as a fresh idea. Every loop burned tokens, time, and a little bit of my soul. So I built RegressionLedger.

The core idea: verdicts should outlive the context window