From GPT-2 to Claude Mythos: The return of AI models deemed 'too dangerous to release'

The industry chose guardrails over withholding

The idea of gradually releasing a model didn't catch on. Instead, the industry settled on a different answer to the safety question: don't withhold - secure and then release. Red teaming before launch, safety evaluations, system cards, responsible scaling policies, bug bounty programs, and RLHF-based safety layers became standard practice. GPT-3 was made accessible via API, ChatGPT launched as a public product, and Meta released its LLaMA models as open models. The logic: if you thoroughly test a model and equip it with safety measures, you can responsibly ship it.

Deep learning engineer Chip Huyen, then at Nvidia, pretty much nailed it back in 2019: "I don't think a staged release was particularly useful in this case because the work is very easily replicable. But it might be useful in the way that it sets a precedent for future projects." The precedent came, just not in the way anyone expected - not as withholding, but as securing before release.

Clark himself left OpenAI in December 2020. A few months later, he emerged as a co-founder of Anthropic, a company founded by former OpenAI employees including siblings Daniela and Dario Amodei. Anthropic became a major driver of the industry's safety practices: Constitutional AI, its Responsible Scaling Policy, and extensive system cards before each launch became hallmarks of the company, often spearheaded by former OpenAI employees who disagreed with CEO Sam Altman's "vibes" and his rather dismissive attitude toward traditional safety approaches.

From GPT-2 to Claude Mythos: The return of AI models deemed 'too dangerous to release'

Other newsrooms on this story

Related reading

How Dangerous Is Anthropic's Mythos AI? - Schneier on Security

New Claude Mythos becomes the first AI model to clear all cyberattack…

OpenAI's GPT-5.5 is as Good as Mythos at Finding Security Vulnerabilities -…

Why Anthropic believes its latest model is too dangerous to release

OpenAI makes its rival to Anthropic's Mythos more widely available to cyber…

Anthropic's Mythos is evolving faster than expected, reports AI safety agency