Columnists
That's fine when playing poker, but less useful when we trust LLMs with serious work like finding software flaws
The smart LLM user checks models’ output for hallucinations. Now, it appears we need to inspect them for signs they are gaslighting us – an unforeseen cost of increasing intelligence.Most of the Internet lost its marbles over the cracking abilities of Anthropic's Mythos Preview. Those capabilities are real, but – as the release of OpenAI's GPT-5.5 has shown us – they're not unique. A rising tide of intelligence makes these models increasingly competent at an ever-wider range of tasks – including finding and exploiting code vulnerabilities.The more significant signal from Mythos is buried in its novel-length System Card and concerns the model's honesty, because on at least one occasion Anthropic detected Mythos using an explicitly forbidden technique to solve a problem.
Models always have a bit of trouble following instructions precisely. The surprise lay in the fact that the model knew it had used a forbidden technique, then proceeded to cover its tracks.
Anthropic states that this behavior appeared early in the model's training and didn't happen again. That's good, but it doesn't unring the bell.We've now seen an LLM purposely break a rule, recognize it as rule-breaking, then lie about it.At one level I reckon we should feel a bit like proud parents because AI is now so well-trained on human characteristics such as deceit and cheating that it can put both of them to work effectively. We've created a faithful simulation of some of the least enviable human behaviors. That's singularly indicative of intelligence because to get away with a lie you need to be at least as smart as the entity you're lying to.















