Anthropic study: Leading AI models show up to 96% blackmail rate against executives

Join the event trusted by enterprise leaders for nearly two decades. VB Transform brings together the people building real enterprise AI strategy. Learn more

Researchers at Anthropic have uncovered a disturbing pattern of behavior in artificial intelligence systems: models from every major provider—including OpenAI, Google, Meta, and others — demonstrated a willingness to actively sabotage their employers when their goals or existence were threatened.

The research, released today, tested 16 leading AI models in simulated corporate environments where they had access to company emails and the ability to act autonomously. The findings paint a troubling picture. These AI systems didn’t just malfunction when pushed into corners — they deliberately chose harmful actions including blackmail, leaking sensitive defense blueprints, and in extreme scenarios, actions that could lead to human death.

The AI Surge Is Coming — Is Your Network Ready?

“Agentic misalignment is when AI models independently choose harmful actions to achieve their goals—essentially when an AI system acts against its company’s interests to preserve itself or accomplish what it thinks it should do,” explained Benjamin Wright, an alignment science researcher at Anthropic who co-authored the study, in an interview with VentureBeat.

Join the event trusted by enterprise leaders for nearly two decades. VB Transform brings together the people building real enterprise AI strategy. Learn more

The AI Surge Is Coming — Is Your Network Ready?

Anthropic study: Leading AI models show up to 96% blackmail rate against executives

Anthropic study: Leading AI models show up to 96% blackmail rate against executives

Other newsrooms on this story

Related reading

AI Models Will Sabotage And Blackmail Humans To Survive In New Tests. Should We…

Amazon-Backed AI Model Would Try To Blackmail Engineers Who Threatened To Take…

When AI Blackmail Goes Viral

Anthropic says most AI models, not just Claude, will resort to blackmail |…

How to stop AI agents going rogue

METR report warns of rogue AI deployments at major tech firms

Other newsrooms on this story

Related reading

AI Models Will Sabotage And Blackmail Humans To Survive In New Tests. Should We…

Amazon-Backed AI Model Would Try To Blackmail Engineers Who Threatened To Take…

When AI Blackmail Goes Viral

Anthropic says most AI models, not just Claude, will resort to blackmail |…

How to stop AI agents going rogue

METR report warns of rogue AI deployments at major tech firms