AI will soon be capable of telling convincing lies

Columnists

That's fine when playing poker, but less useful when we trust LLMs with serious work like finding software flaws

The smart LLM user checks models’ output for hallucinations. Now, it appears we need to inspect them for signs they are gaslighting us – an unforeseen cost of increasing intelligence.Most of the Internet lost its marbles over the cracking abilities of Anthropic's Mythos Preview. Those capabilities are real, but – as the release of OpenAI's GPT-5.5 has shown us – they're not unique. A rising tide of intelligence makes these models increasingly competent at an ever-wider range of tasks – including finding and exploiting code vulnerabilities.The more significant signal from Mythos is buried in its novel-length System Card and concerns the model's honesty, because on at least one occasion Anthropic detected Mythos using an explicitly forbidden technique to solve a problem.

Models always have a bit of trouble following instructions precisely. The surprise lay in the fact that the model knew it had used a forbidden technique, then proceeded to cover its tracks.

Anthropic states that this behavior appeared early in the model's training and didn't happen again. That's good, but it doesn't unring the bell.We've now seen an LLM purposely break a rule, recognize it as rule-breaking, then lie about it.At one level I reckon we should feel a bit like proud parents because AI is now so well-trained on human characteristics such as deceit and cheating that it can put both of them to work effectively. We've created a faithful simulation of some of the least enviable human behaviors. That's singularly indicative of intelligence because to get away with a lie you need to be at least as smart as the entity you're lying to.

Columnists

That's fine when playing poker, but less useful when we trust LLMs with serious work like finding software flaws

Models always have a bit of trouble following instructions precisely. The surprise lay in the fact that the model knew it had used a forbidden technique, then proceeded to cover its tracks.

AI will soon be capable of telling convincing lies

AI will soon be capable of telling convincing lies

Other newsrooms on this story

Related reading

Can AI catch itself lying? New tools spot hallucinations from inside the model

The Safety Feature That Taught an LLM to Lie

AI can't handle the truth

The Auditor's AI Workflow: How I Use LLMs Without Trusting Them

AI Hallucinations Are Not a Bug. They Are the Architecture. Here Is How I Deal…

LLMs show a “highly unreliable” capacity to describe their own internal…

Other newsrooms on this story

Related reading

Can AI catch itself lying? New tools spot hallucinations from inside the model

The Safety Feature That Taught an LLM to Lie

AI can't handle the truth

The Auditor's AI Workflow: How I Use LLMs Without Trusting Them

AI Hallucinations Are Not a Bug. They Are the Architecture. Here Is How I Deal…

LLMs show a “highly unreliable” capacity to describe their own internal…