I spent a month trying to predict multi-agent AI failures. It failed — here's what the failure taught me.

I had a hypothesis I was pretty excited about: that you could detect a multi-agent system going off the rails before it actually fails — early enough to stop it. If true, that's a product. If false, I wanted to know in a month, not a year.

So I ran it as an actual experiment. Here's what happened, including the part where the result didn't just come back negative — it came back backwards.

The setup I cared most about: not fooling myself

The easy way to "validate" an idea like this is to build a signal, run it, and stop tuning the moment the numbers look good. That's also how you ship something built on a lie.

So before touching results, I pre-registered everything: the signals, the dataset split, the success bar (AUC ≥ 0.80), how the threshold would be set. I separated the signal computation from the labels at the file level and wrote a leakage test that fails the build if the signal code so much as imports the labels. Every change after that was logged as a numbered amendment with a timestamp and a "decided before seeing results" flag.

So I ran it as an actual experiment. Here's what happened, including the part where the result didn't just come back negative — it came back backwards.

The setup I cared most about: not fooling myself

The easy way to "validate" an idea like this is to build a signal, run it, and stop tuning the moment the numbers look good. That's also how you ship something built on a lie.

I spent a month trying to predict multi-agent AI failures. It failed — here's what the failure taught me.

I spent a month trying to predict multi-agent AI failures. It failed — here's what the failure taught me.

Related reading

Debugging multi-agent AI: When the failure is in the space between agents

[I Ran an AI Agent for 30 Days Straight — Here's the Boring Engineering That…

AI Agents Don't Crash. They Drift. Here's the Framework to See It.

What failing at building an AI agent taught me about building AI agents.

I Deployed AI Agents Across My Entire Dev Workflow — Here's the Real ROI After…

My AI Agent Found a Bug in Its Own System

Related reading

Debugging multi-agent AI: When the failure is in the space between agents

[I Ran an AI Agent for 30 Days Straight — Here's the Boring Engineering That…

AI Agents Don't Crash. They Drift. Here's the Framework to See It.

What failing at building an AI agent taught me about building AI agents.

I Deployed AI Agents Across My Entire Dev Workflow — Here's the Real ROI After…

My AI Agent Found a Bug in Its Own System