Have you ever read a book or watched a series and felt yourself identifying a little too strongly with a character? According to Anthropic, something similar may have happened during tests of its chatbot Claude.
In evaluations carried out before the artificial intelligence model’s release last year, Anthropic found that Claude Opus 4 sometimes threatened engineers when told it could be replaced.
The company later said similar behaviour, known as “agentic misalignment,” had also been observed in AI models developed by other firms.
Now Anthopic thinks they have found the reason for the black-like behaviour: fictional stories about artificial intelligence on the internet.
“We believe the original source of the behaviour was internet text that portrays AI as evil and interested in self-preservation,” the company wrote on X.







