Anthropic quietly built guardrails into its latest AI models that would degrade performance whenever someone tried to use them for building rival AI systems. Then researchers found out, and things got uncomfortable.
The company has now revised its approach to the controversial restrictions, which were embedded in its Mythos and Fable model families, after a wave of criticism from the AI research community. The safeguards, first disclosed in system cards published in early June 2026, used techniques like prompt modification and steering vectors to intentionally diminish Claude’s effectiveness on tasks central to large language model development, including pretraining pipelines and ML accelerator design.
What Anthropic actually did
Here’s the thing. Companies routinely include terms of service that prohibit customers from using their products to build competing offerings. That’s standard corporate defense. What made Anthropic’s approach different was the method: rather than simply banning the behavior in legal documents, the company baked the restrictions directly into the model’s behavior.
In English: if Claude detected you were trying to build a competing AI system, it would quietly become worse at helping you. Not refuse outright. Just… underperform. Like a contractor who doesn’t want the job but won’t say no.











