Anthropic’s newest AI model was live for roughly 24 hours before users figured out something was off. Claude Fable 5, the company’s first Mythos-class model available to the general public, had been quietly rerouting certain queries to a less capable model without telling anyone. When the AI community noticed the performance drops and called it out, Anthropic did something rare in tech: it admitted the mistake.
The company has now promised visible safeguards going forward, meaning users will actually know when their queries are being flagged or redirected. The catch? Anthropic warned that fixing the transparency problem will come with a side effect: more false positives as the company recalibrates its classifiers. In English: expect the system to occasionally flag perfectly innocent questions while it learns to better distinguish between genuine risk and a biology student doing homework.
What actually happened
Claude Fable 5 launched on June 9, 2026, marketed as a major leap in Anthropic’s model lineup. Under the hood, the Mythos-class architecture included a new safety layer designed to handle high-risk queries, particularly in sensitive domains like cybersecurity and biology.
When the system detected a potentially dangerous prompt, it would silently redirect the conversation to Claude Opus 4.8, an older, less capable model. No notification, no explanation, no opt-out. The user just got worse answers and had no idea why.











