I Let Claude Design 4 Chaos Experiments via MCP. The 4th Took Down Staging and Found a 6-Month-Old Bug.

Every experiment in this post ran in staging. Production was double-locked: a ## Chaos Rules block in CLAUDE.md forbidding production targets, and a PreToolUse hook that exits 2 if --env=production shows up in any chaos command. I'll show both at the end. The point of saying this upfront is that "I let Claude design chaos experiments" is the kind of sentence people read sideways. The TL;DR is: staging only, twice-locked, and the whole exercise was supervised end to end.

With that out of the way: Steadybit released what is widely described as the first chaos engineering MCP server in mid-2025. I plugged Claude Code into it and asked, in a single sentence, to design experiments that test payment-service's resilience under connection-pool stress. Claude proposed four of them. Three came back without an SLO breach. The fourth took staging down completely. When I traced the failure, it wasn't a contrived test bug. It was a real production pattern that had been flickering in our logs for 6 months and that we had never been able to reproduce: pool exhaustion → retry storm → rate limiter self-DoS. Here is the run, the bug, and the three guardrails I now require before letting any AI design chaos experiments.

I Let Claude Design 4 Chaos Experiments via MCP. The 4th Took Down Staging and Found a 6-Month-Old Bug.

I Let Claude Design 4 Chaos Experiments via MCP. The 4th Took Down Staging and Found a 6-Month-Old Bug.

Related reading

I Let Claude Code Run Unsupervised for 24 Hours. Here's What Happened.

I let Claude Code run --dangerously-skip-permissions on my production DB.…

Claude Code, Beyond the Prompt — Part 4: Your First MCP Server (Give Claude…

We Ran 4 Claude Code Dialogs for 28 Hours. Here's What the Memory Layer Caught…

How I Set Up Claude Code with 26 Production Subagents (CLAUDE.md, MCP, Hooks)

The bugs I could only find by running the thing

Related reading

I Let Claude Code Run Unsupervised for 24 Hours. Here's What Happened.

I let Claude Code run --dangerously-skip-permissions on my production DB.…

Claude Code, Beyond the Prompt — Part 4: Your First MCP Server (Give Claude…

We Ran 4 Claude Code Dialogs for 28 Hours. Here's What the Memory Layer Caught…

How I Set Up Claude Code with 26 Production Subagents (CLAUDE.md, MCP, Hooks)

The bugs I could only find by running the thing