Anthropic Apologizes for Claude Fable 5 Secret Censorship—But the Fix Has a Catch - Decrypt

In brief

Anthropic admitted its invisible LLM-development safeguards were "the wrong tradeoff" and will replace them with visible fallbacks to Claude Opus 4.8, starting this week.

Flagged requests on the API will now return a reason for their refusal, rather than silently delivering a degraded answer.

Making the safeguards visible means they'll be easier to work around.

Anthropic spent about 48 hours as the AI industry's villain of the week before blinking.The company launched Claude Fable 5 this week to immediate backlash over a safeguard buried in its 319-page system card: The model, the first of the company’s new Mythos class, would secretly degrade its own responses for users it suspected were building competing AI models—no warning, no fallback message, just quietly worse output. By Thursday, Anthropic was apologizing.

Anthropic Apologizes for Claude Fable 5 Secret Censorship—But the Fix Has a Catch - Decrypt

Other newsrooms on this story

Related reading

Anthropic apologizes for Claude Fable 5 censorship, promises fixes

Anthropic backpedals on Fable safety measure

Anthropic faces backlash over stricter safeguards on Claude model

Claude Fable 5: Anthropic admits "wrong tradeoff" after invisibly throttling…

Anthropic to reassess Claude Fable 5 AI development restrictions after backlash

Anthropic accused of 'secret sabotage' as Claude Fable 5 silently limits…