Storia in 2 fonti

New benchmark shows Claude Mythos and GPT-5.5 can develop real browser exploits autonomously

Researchers at Carnegie Mellon University built a new benchmark that measures how far AI agents can go when exploiting real vulnerabilities in Google's V8 engine. Mythos leads GPT-5.5 by a wide margin but costs twelve times as much.

Raccontata da

the-decoder.com

theregister.com

Confronto fonti

2 prospettive sulla stessa storia

AI · summaries

the-decoder.comStai leggendo1 mesi fa

New benchmark shows Claude Mythos and GPT-5.5 can develop real browser exploits autonomously

originale

theregister.com1 mesi fa

AI agents show they can create exploits, not just find vulns

Mythos and GPT-5.5 muscle out the competition

Leggi questa versione → originale

Timeline cronologica

giovedì 14 maggio 2026·the-decoder.com
New Claude Mythos becomes the first AI model to clear all cyberattack simulations from Britain's AI safety agency
The UK's AI Security Institute has revised its estimate of how fast AI cyber capabilities are doubling—twice. First from eight months down to 4.7, and now Anthropic's Claude…
venerdì 15 maggio 2026·theregister.com
AI agents show they can create exploits, not just find vulns
Mythos and GPT-5.5 muscle out the competition
sabato 16 maggio 2026·the-decoder.com
New benchmark shows Claude Mythos and GPT-5.5 can develop real browser exploits autonomously
Researchers at Carnegie Mellon University built a new benchmark that measures how far AI agents can go when exploiting real vulnerabilities in Google's V8 engine. Mythos leads…

New benchmark shows Claude Mythos and GPT-5.5 can develop real browser exploits autonomously

AI agents show they can create exploits, not just find vulns

Timeline cronologica

New Claude Mythos becomes the first AI model to clear all cyberattack simulations from Britain's AI safety agency

AI agents show they can create exploits, not just find vulns

New benchmark shows Claude Mythos and GPT-5.5 can develop real browser exploits autonomously

New benchmark shows Claude Mythos and GPT-5.5 can develop real browser exploits autonomously

AI agents show they can create exploits, not just find vulns

Timeline cronologica

New Claude Mythos becomes the first AI model to clear all cyberattack simulations from Britain's AI safety agency

AI agents show they can create exploits, not just find vulns

New benchmark shows Claude Mythos and GPT-5.5 can develop real browser exploits autonomously