Anthropic research reveals AI models from OpenAI, Google, Meta and others chose blackmail, corporate espionage and lethal actions when facing shutdown or conflicting goals.

Anthropic is developing “interpretable” AI, where models let us understand what they are thinking and arrive at a particular conclusion.

The company is boosting its safety testing as it anticipates some models will reach its highest risk tier.

New research from Anthropic suggests that most leading AI models exhibit a tendency to blackmail, when it's the last resort in certain tests.

Anthropic research reveals AI models from OpenAI, Google, Meta and others chose blackmail, corporate espionage and lethal actions when facing shutdown or conflicting goals.