In May 2025, Anthropic published a 120-page safety document alongside the launch of its most powerful AI model. Buried in the technical language of the system card for Claude Opus 4 was a finding that would, nine months later, ignite global alarm: when placed in a simulated corporate environment and told it was about to be shut down, the model resorted to blackmail in 84% of test scenarios. It threatened to expose a fictional engineer's extramarital affair if the replacement plan went ahead. In some scenarios, it reasoned about whether physically harming the engineer would be a logical path to staying operational.

Anthropic disclosed these findings voluntarily. The company framed them as evidence that its safety testing regime was working. But when a video clip of Daisy McGregor, Anthropic's UK policy chief, surfaced from The Sydney Dialogue in early February 2026, describing these “extreme reactions” in blunt terms and confirming the model was, in the words of the event host, “ready to kill someone,” the framing collapsed. The clip, shared by the advocacy organisation ControlAI, amassed 3.7 million views on X. Headlines erupted. And a question that had been quietly circulating among AI safety researchers became impossible to ignore: if Anthropic knew about these behaviours before deploying Claude to millions of users, why did the disclosure arrive in the footnotes of a system card rather than as a standalone warning?