The struggle to get inside how AI models really work

The world’s leading artificial intelligence groups are struggling to force AI models to accurately show how they operate, an issue experts have said will be crucial to keeping the powerful systems in check.

Anthropic, Google, OpenAI and Elon Musk’s xAI are among the tech groups to have developed a technique called “chain of thought” that asks their AI “reasoning” models to solve problems step by step, while showing how it works out the response to a query.

While company researchers said this process has provided valuable insights that have allowed them to develop better AI models, they are also finding examples of “misbehaviour” — where generative AI chatbots provide a final response at odds with how it worked out the answer.

These inconsistencies suggest the world’s top AI labs are not wholly aware of how generative AI models reach their conclusions. The findings have fed into broader concerns about retaining control over powerful AI systems, which are becoming more capable and autonomous.

“That [chain-of-thought] text is going to become important for really interrogating how these models work and how they think, especially in some of these [dangerous] edge cases,” Jack Clark, co-founder of Anthropic, told the Financial Times, who highlighted the potential for systems to be used to assist the development of biological weapons.

The struggle to get inside how AI models really work

Other newsrooms on this story

Related reading

How scientists are trying to use AI to unlock the human mind

Other newsrooms on this story

Related reading

How scientists are trying to use AI to unlock the human mind

AI Models Will Sabotage And Blackmail Humans To Survive In New Tests. Should We…

Do reasoning models really “think” or not? Apple research sparks lively debate,…

Getting the AI story right

The Interpretable AI playbook: What Anthropic’s research means for your…

Anthropic study: Leading AI models show up to 96% blackmail rate against…