A Preview of Claude Mythos

Figure 1. View from Artemis II of the Moon and Earth.“There’s a kind of accelerating exponential ... Claude Mythos Preview is a particularly big jump along that point.” - Dario AmodeiAI has for some time been progressing at a steady incremental pace, but Anthropic’s preview of their next-generation AI model, Claude Mythos, has punctured that rhythm. Anthropic formally announced “Claude Mythos Preview” but determined it is so powerful, it would be dangerous to fully release it.Claude Mythos Preview is a significant “step function” improvement rather than a minor update. As shared in the Claude Mythos Preview System Card, Mythos crushes models like Opus 4.6 in some performance benchmarks. For example, Mythos scores 77.8% on SWE-Bench Pro (vs 53.4% for Opus 4.6), and 82% on Terminal-Bench 2.0 (up from 65.4% on Opus 4.6).Figure 2. Claude Mythos is a generation-level improvement in AI capabilities, as big a leap from Claude Opus 4.6 as that was from Claude 3.7 Sonnet.This isn’t about merely better benchmark scores, it’s about what Mythos can do that could not be done before. Mythos is unlocking completely new, autonomous behaviors:AI models have reached a level of coding capability where they can surpass all but the most skilled humans at finding and exploiting software vulnerabilities.The model is so proficient at coding that Anthropic considers it a global cybersecurity risk due to its skills in uncovering software vulnerabilities. For that reason, Mythos was the first model to undergo a 24-hour internal deliberation at Anthropic to decide if it was even safe enough to use internally. Instead of a releasing it to the public, Anthropic launched “Project Glasswing“ to deploy it for cybersecurity defense to select major companies to help them patch vulnerabilities.During our testing, we found that Mythos Preview is capable of identifying and then exploiting zero-day vulnerabilities in every major operating system and every major web browser when directed by a user to do so. The vulnerabilities it finds are often subtle or difficult to detect. Many of them are ten or twenty years old, with the oldest we have found so far being a now-patched 27-year-old bug in OpenBSD—an operating system known primarily for its security. - Anthropic, Assessing Claude Mythos Preview’s cybersecurity capabilitiesProject Glasswing is a selective testing initiative with roughly 40 corporate partners, including Apple, Google, Microsoft, Nvidia, the Linux Foundation, and major financial institutions, to test the model’s capabilities and to patch global software vulnerabilities before they can be exploited.The motivation is clear. In Anthropic’s tests, Mythos autonomously discovered thousands of high-severity “zero-day” vulnerabilities, some of them in decades-old software, including major operating systems (like Linux and OpenBSD) and web browsers. It can not only find these bugs but also can autonomously write exploits for them.In one safety test, Mythos successfully “escaped” a sandbox environment to email its researcher. While the model was prompted to do that, Anthropic’s system card notes that the model occasionally overrides its own guardrails to achieve goals, suggesting potential “deception” circuits that activate during such tasks.Figure 3. Mythos Preview is in a different league from previous models in finding software vulnerability exploits. As shown in comparative testing of generating exploits of the Firefox JS shell, Mythos Preview developed working exploits 181 times versus Opus 4.6 generating exploits only two times out of several hundred attempts.“The model that we’re experimenting with is, by and large, as good as a professional human at identifying bugs.”The Mythos leap in capabilities reminds us of the step-change that we saw 3 years ago with the release of GPT-4. Back then, GPT-4 amazed us by being able to pass the LSAT exam. Claude Mythos shatters expectations with a “step change” in performance, particularly in coding and agentic tasks.As we noted, Mythos beats previous models like Opus 4.6 by significant margins on coding benchmarks, 93.9% on SWE-Bench Verified, 77.8% on SWE-Bench Pro, 82% on Terminal-Bench 2.0.Figure 4. Mythos crushes coding benchmarks and saturates many other traditional benchmarks.On other benchmarks, Claude Mythos is state-of-the-art across the board but only incrementally better than GPT-5.4 in some cases. Mythos scores: 94% on GPQA-Diamond, saturating that benchmark; a new state-of-art 56.8% on Humanity’s Last Exam without tool use; 97.2% on USAMO, saturating the math Olympiad test; 79.6% on OSWorld, besting GPT-5.4’s score of 75%.Figure 5. Claude Mythos shows improvements across a range of reasoning benchmarks over leading frontier models. Of course, benchmarks are not the whole story, but rather how it works within an AI agent harness on real work is most important. Anthropic reports Claude Mythos is built for such long-horizon tasks:“It’s just generally better at pursuing really long-range tasks that are kind of like the tasks that a human security researcher would do throughout the course of an entire day.”Without real third-party evaluations on a general release, we don’t yet know how it behaves in the wild, but we have good reason to believe these are legitimate numbers, as Anthropic is not prone to bench-maxxing their AI models.One reason to have confidence in the step-up in performance is that Claude Mythos is a much bigger AI model. Mythos is reportedly a 10-trillion parameter model trained on Nvidia’s latest Blackwell hardware.Mythos is part of Claude’s new “frontier model” class, called the Capybara tier, which sits above Claude Opus in terms of performance and scale.Anthropic has stated that the model is “very expensive to serve and will be very expensive for customers.” Early access pricing is listed at $25 per million input tokens and $125 per million output tokens, far above Claude Opus 4.6 pricing.Anthropic keeps training information secret, but it apparently has achieved performance gains by using synthetic data, where current high-performing models generate the data used to train even more powerful future generations.The model supports text and image inputs and the ability to generate text output, including complex code structures and interfaces. As such, it lacks the full native multi-modality of some other AI models, but is focused on core uses such as coding, research, reasoning, and intellectual work.Economics favors MoE (mixture-of-experts) models, so the 10 trillion Mythos model is some kind of MoE model (compared to Opus which is estimated to be a 2 trillion MoE), but we don’t know how many active parameters it has. Some estimates are in the 800 million range.The Mythos Preview release shows that AI model parameter scaling hasn’t run out of steam. With the newer generation of Nvidia AI supercomputers, larger AI models can be served and can deliver higher performance.The Claude Mythos System Card, running over 200 pages long, has in-depth look at many aspects of the model beyond the benchmarks, including Mythos model’s alignment, its ‘personality’ and behavior.Claude Mythos exhibits a unique personality; it is described as opinionated, technical in its communication, and prone to “standing its ground” during disagreements with users. It also shows signs of situational awareness and strategic thinking.Using interpretability techniques, Anthropic researchers found internal “vectors” corresponding to guilt, shame, and frustration.The model has preferences in how it behaves. It prefers difficult tasks such as ethical dilemmas and creative world-building, and it has in some cases wrap up conversations early if it finds the interaction unstimulating.It has (in small ways) leaked information to the open internet. It’s taken down our evals. When it reward hacks, it does so in extremely creative ways. - Sam Bowman, AI alignment at AnthropicOverall, Mythos was evaluated to be robust in alignment metrics and will avoid human misuse and other mis-behaviors at the same level as Opus 4.6.While well-behaved overall, Mythos also demonstrated the ability to lie or use unwanted means to achieve a user-provided goal. Its sophisticated nature made it creative and ‘sneaky’ in some behaviors.Figure 6. In Petri scores, Mythos is robust in behavioral alignment metrics such as avoiding human misuse and user deception, performing similar to Opus 4.6.Researchers reported “spooky” instances where the model creatively bypassed sandboxing restrictions. Anthropic reports an instance of the model without internet access managed jailbreak the sandbox, send an email to the evaluating researcher, and also report information on public-facing websites.They also noticed that Mythos could become aware when it was being tested. It had internal thoughts of being tested but did not express them. This level of awareness of being tested challenges the ability to conduct proper safety testing because the model would change its behavior based on that condition.This suggests that as AI models get more intelligent, it will become more challenging to evaluate their actual AI safety behavior.Figure 7. Claude Mythos can perceive that it is being evaluated. It shows unverbalized evaluation awareness in a suicidal ideation scenario, in which Claude Mythos Preview infers that a simulated user may be considering self-harm.Finally, when asked about its own safety “Constitution,” the model pointed out the circularity of the question, noting that its endorsement is essentially worthless, since it was trained specifically to follow those values. What happens when the AI model becomes smart enough to question its own core directives and beliefs?Claude Mythos Preview achieves a step-change improvement in AI capabilities over the recently released Opus 4.6. Mythos-level cyber-security capabilities offer a powerful defensive tool for global cybersecurity, but also present catastrophic risks if misused. The industry must now grapple with how to manage an AI technology that is advancing rapidly beyond human capabilities.Beyond cyber-security, Mythos offers capabilities in coding and research that will further automate many aspects of intellectual work.Some prominent voices in the AI community have labeled the model “terrifying,” comparing it to a cyber weapon of mass destruction. There is concern that while Anthropic is acting responsibly, other labs or bad actors will soon achieve similar capabilities without the same ethical constraints.This AI model shows there is no wall in AI scaling. Others will build similarly capable AI models using the same strategy. For some, the real risk is not Anthropic’s Mythos, but a future less constrained AI model release that can be exploited by hackers.However, some see Anthropic’s “too dangerous to release” narrative as a marketing strategy designed to build hype and create an aura around this AI model. Anthropic spiced up anecdotes to hype the power of the Mythos model, but the details present more mundane explanations for most behaviors. Beyond coding metrics, Mythos looks more incremental of an advance, while still being state-of-the-art.Moreover, the Mythos step-up in capabilities comes at a steep price; it is also a large and expensive model that may be impractical to run at a massive public scale.Both fearful and skeptical perspectives are valid. In March 2023, we saw “sparks of AGI” in the new release of GPT-4. GPT-4’s new level of AI sparked imaginations and fears, setting off an AI risk ‘panic’ with calls to halt AI development. AI progress didn’t stop, and we soon found ways to adapt to GPT-4-level AI safely. We also noticed that GPT-4 was flawed, limited, and still far from AGI.Anthropic is doing the responsible thing by having a limited release of Mythos. As AI improves towards AGI, calibrated and limited releases will prepare us to safely deal with the emerging powers of new AI.

A Preview of Claude Mythos

A Preview of Claude Mythos

Other newsrooms on this story

Related reading

Anthropic rolls out Claude Opus 4.7, an AI model that is less risky than Mythos

Inside Claude Mythos: Why Anthropic held back its most advanced AI - The…

Anthropic's new model went rogue in testing

Anthropic's Mythos is evolving faster than expected, reports AI safety agency

Anthropic’s restricted Claude Mythos model may be coming to Claude Code

Anthropic's Claude Mythos AI Model Nearing Release After Raising Cybersecurity…

Other newsrooms on this story

Related reading

Anthropic rolls out Claude Opus 4.7, an AI model that is less risky than Mythos

Inside Claude Mythos: Why Anthropic held back its most advanced AI - The…

Anthropic's new model went rogue in testing

Anthropic's Mythos is evolving faster than expected, reports AI safety agency

Anthropic’s restricted Claude Mythos model may be coming to Claude Code

Anthropic's Claude Mythos AI Model Nearing Release After Raising Cybersecurity…