There's a specific loop that anyone who codes with AI agents knows by heart.
The agent tries a fix. The test fails. It tries something else. Fails again. Then — a few prompts later, or after its context window gets compacted — it confidently re-applies the exact first fix that already failed.
The failure got erased from its memory. The confidence didn't.
I watched Claude Code do this four times in one session, each time presenting the same broken patch as a fresh idea. Every loop burned tokens, time, and a little bit of my soul. So I built RegressionLedger.
The core idea: verdicts should outlive the context window






