Claude Fable 5 Was Jailbroken in 48 Hours. Here's What Actually Stopped Nothing.

Anthropic spent 1,000 hours running an external red-team bounty before launching Claude Fable 5. The claim coming out of that program: no universal jailbreaks found. Within 48 hours of public release, a researcher known as Pliny the Liberator publicly claimed to have bypassed those guardrails anyway.

The techniques weren't exotic. They were a layered combination of Unicode/homoglyph substitution, long-context framing, narrative fiction framing, and a decomposition-recomposition strategy — breaking a harmful request into a series of individually innocuous-seeming sub-prompts. The use cases claimed were serious: drug synthesis assistance and attacks on crypto protocols.

This isn't an indictment of Anthropic specifically. It's a structural problem. Model-layer guardrails are a single point of failure, and they're always going to lose to researchers with enough time and creativity. The question is what you put in front of the model.

How the Attack Actually Worked

Based on what's been reported, Pliny combined at least four distinct evasion techniques simultaneously:

Claude Fable 5 Was Jailbroken in 48 Hours. Here's What Actually Stopped Nothing.

Other newsrooms on this story

Related reading

Researcher Jailbreaks Claude Fable 5 Within 48 Hours of Launch

Claude Fable 5 and Mythos 5: Capabilities

Anthropic disputes jailbreak allegations against Claude Fable 5

I Checked Why Claude Fable 5 Was Suspended 4 Days After Launch. This Is Not an…

Anthropic Restores Claude Fable 5 After U.S. Lifts Jailbreak-Linked Export…

Anthropic Disputes Fable 5 AI Jailbreak