Nearly one in three attempts to hijack Anthropic’s newest AI browser agent succeeded before safeguards kicked in. That is not a rumor from a red-team Slack channel. It is a number Anthropic printed in its own system card.
The company released the Claude Opus 4.8 system card on May 28, spanning 244 pages and covering four agentic surfaces. The pre-safeguard hijack rate for the browser agent clocked in at 31.5%. To put that in plain terms: if a malicious actor pointed a prompt injection attack at the model while it was browsing the web, the attack worked roughly a third of the time, assuming no defensive layers were active.
The transparency gap across frontier labs
Here’s the thing. That 31.5% figure looks bad in isolation. But Anthropic is the only frontier lab that actually gave security professionals a concrete number to work with this spring.
OpenAI published a prompt injection disclosure that covered only one surface: connectors. Google moved the entire subject out of its model card and into a broader safety framework document, effectively diluting the specificity. Meta shipped no closed-model card at all.













