TL;DR: Six parts of bad news. Here's what actually helps — with code. Cross-family judges reduce the core bias. Structured multi-dimensional evaluation cuts it by 31.5%. Chain-of-thought adds 1.5 to 13 accuracy points. Population monitoring catches drift before it locks in. Full implementation patterns below. Copy them.
The series: Part 1 biased judge. Part 2 upgrade made it worse. Part 3 population drifted. Part 4 adversarial takeover at 2%. Part 5 the regulation has holes. Part 6: what you can actually do about it.
You made it.
Six weeks of finding out that your pipeline was biased, then more biased, then collectively biased, then adversarially vulnerable, then unauditable under current law.
Good news: some things actually help.






