I had a hypothesis I was pretty excited about: that you could detect a multi-agent system going off the rails before it actually fails — early enough to stop it. If true, that's a product. If false, I wanted to know in a month, not a year.
So I ran it as an actual experiment. Here's what happened, including the part where the result didn't just come back negative — it came back backwards.
The setup I cared most about: not fooling myself
The easy way to "validate" an idea like this is to build a signal, run it, and stop tuning the moment the numbers look good. That's also how you ship something built on a lie.
So before touching results, I pre-registered everything: the signals, the dataset split, the success bar (AUC ≥ 0.80), how the threshold would be set. I separated the signal computation from the labels at the file level and wrote a leakage test that fails the build if the signal code so much as imports the labels. Every change after that was logged as a numbered amendment with a timestamp and a "decided before seeing results" flag.






