In brief
Anthropic admitted its invisible LLM-development safeguards were "the wrong tradeoff" and will replace them with visible fallbacks to Claude Opus 4.8, starting this week.
Flagged requests on the API will now return a reason for their refusal, rather than silently delivering a degraded answer.
Making the safeguards visible means they'll be easier to work around.
Anthropic spent about 48 hours as the AI industry's villain of the week before blinking.The company launched Claude Fable 5 this week to immediate backlash over a safeguard buried in its 319-page system card: The model, the first of the company’s new Mythos class, would secretly degrade its own responses for users it suspected were building competing AI models—no warning, no fallback message, just quietly worse output. By Thursday, Anthropic was apologizing.










