Anthropic disputes jailbreak allegations against Claude Fable 5

Anthropic is pushing back on claims that its newest AI model, Claude Fable 5, was successfully jailbroken just days after going live.

Claude Fable 5 launched on June 9, 2026, as Anthropic’s publicly accessible version of its Mythos-class model architecture. Within roughly 48 hours, reports began circulating that researchers had found ways to bypass its security guardrails using novel multi-agent techniques.

What happened and what Anthropic is saying

Anthropic built Claude Fable 5 with a layered safety architecture that includes classifiers designed to intercept high-risk queries. When someone asks about sensitive topics like cybersecurity exploits or chemical synthesis, those queries get routed away from the main model and onto Claude Opus 4.8, a less capable system with tighter restrictions.

Before launch, Anthropic ran over 1,000 hours of external bug bounty testing. The company says more than 30 known jailbreak techniques were thrown at the model. None produced a universal bypass.

Anthropic disputes jailbreak allegations against Claude Fable 5

Other newsrooms on this story

Related reading

Anthropic Disputes Fable 5 AI Jailbreak

Anthropic details Claude Fable 5's cybersecurity safeguards and AI jailbreak…

Researcher Jailbreaks Claude Fable 5 Within 48 Hours of Launch

Anthropic to reassess Claude Fable 5 AI development restrictions after backlash

Anthropic Releases Claude Fable 5, Its Most Powerful AI Yet, With Cyber…

Anthropic Launches Claude Fable 5: Mythos-Class AI With Cybersecurity Guardrails