Anthropic is changing course after facing criticism for quietly downgrading certain requests to its most capable AI model.
On Tuesday, the $965 billion company released a version of its most capable model, Mythos. Anthropic revealed Mythos in April, but held back any Mythos-class models from the public partly because the company said it was extremely adept at skirting cybersecurity defenses and was too dangerous to release.
This week, though, it opted to release the Mythos-class model, Fable 5, even as its capabilities “exceed those of every model we’ve previously made generally available,” according to Anthropic. Dianne Na Penn, Anthropic’s head of product management, research, and labs, previously told Fortune the company felt comfortable releasing Fable 5 because it feels “more confident with our safety guardrails in place.”
Yet, it was one of those guardrails, buried in a 319-page safety document, that earlier this week prompted a wave of backlash from AI researchers and other users online, and has now prompted the company to improve its transparency.
Information found in the Fable 5’s system card, a long document of safety disclosures, revealed the model would silently downgrade some requests related to advanced AI development. If, for example, an AI researcher is using Fable 5 to build their own AI, the program would default to a less capable model.














