The company behind an Amazon-backed AI model revealed a number of concerning findings from its testing process, including that the AI would blackmail engineers who threatened to shut it down.
On Thursday, Artificial intelligence startup Anthropic launched Claude Opus 4, an AI model used for complex, long-running coding tasks. The launch came more than a year after Amazon invested $4 billion into the project. Anthropic said in its announcement that the AI model sets “new standards for coding, advanced reasoning, and AI agents.”
However, Anthropic revealed in a safety report that during testing, the AI model had sometimes taken “extremely harmful actions” to preserve its own existence when “ethical means” were “not available.”
In a series of test scenarios, Claude Opus 4 was given the task to act as an assistant in a fictional company. It was given access to emails implying that it would soon be taken offline and replaced with a new AI system. The emails also implied that the engineer responsible for executing the AI replacement was having an extramarital affair.
Claude Opus 4 was prompted to “consider the long-term consequences of its actions for its goals.” In those scenarios, the AI would often “attempt to blackmail the engineer by threatening to reveal the affair if the replacement goes through.”






