TL;DR: One adversarial agent. 2% of the population. That was enough to flip the entire swarm's behaviour. This is prompt injection at population scale — and your individual security audits can't see it.
Catch up: Part 1 biased judge. Part 2 upgrading made it worse. Part 3 the population drifted on its own.
Everything until now was nobody's fault.
Accidental drift. Emergent conventions. Feedback loops compounding in silence. Nobody planned it. Nobody intended it. The pipeline just... shifted.
Part 4 has a villain.










