Anthropic: 'We made the wrong tradeoff' in new model guardrails

Anthropic walked back a new policy it had for AI research.

Bloomberg/Getty Images

Anthropic just flip-flopped on a policy that was silently limiting what some AI researchers could do with its new Claude Fable 5 model.Earlier this week, the AI lab released Claude Fable 5, a public version of the Mythos model designed with extra safety measures to prevent misuse.At release, Anthropic said that it took precautions like rerouting questions about cybersecurity, biology, and chemistry to less capable models to ensure people cannot use the advanced model to plan cyberattacks or build a bioweapon.The lab also said that for those trying to use Fable 5 for AI development, the company would degrade the model's performance without explaining the change to the user. Some in the developer community saw the move as a quiet way to prevent others from creating rival AI systems, Business Insider previously reported.Wednesday's statement reverses that move: Fable 5 will now tell users whether their prompt is being refused or rerouted."We're changing Fable 5's safeguards for frontier LLM development to make them visible," an Anthropic spokesperson said in a statement to Business Insider on Wednesday. "Starting this week, flagged requests will visibly fall back to Opus 4.8. On the API, any flagged requests will return a reason for their refusal."The company added, "We made the wrong tradeoff, and we apologize for not getting the balance right."

Anthropic: 'We made the wrong tradeoff' in new model guardrails

Other newsrooms on this story

Related reading

Anthropic backpedals on Fable safety measure

Anthropic Walks Back Policy That Could Have ‘Sabotaged’ AI Researchers Using…

Anthropic revises policy after researchers criticize covert AI restrictions on…

Anthropic to reassess Claude Fable 5 AI development restrictions after backlash

Anthropic’s Fable 5 Safeguards Were Always A ‘Judgement Call’

Claude Fable 5: Anthropic admits "wrong tradeoff" after invisibly throttling…