The world’s leading artificial intelligence groups are struggling to force AI models to accurately show how they operate, an issue experts have said will be crucial to keeping the powerful systems in check.
Anthropic, Google, OpenAI and Elon Musk’s xAI are among the tech groups to have developed a technique called “chain of thought” that asks their AI “reasoning” models to solve problems step by step, while showing how it works out the response to a query.
While company researchers said this process has provided valuable insights that have allowed them to develop better AI models, they are also finding examples of “misbehaviour” — where generative AI chatbots provide a final response at odds with how it worked out the answer.
These inconsistencies suggest the world’s top AI labs are not wholly aware of how generative AI models reach their conclusions. The findings have fed into broader concerns about retaining control over powerful AI systems, which are becoming more capable and autonomous.
“That [chain-of-thought] text is going to become important for really interrogating how these models work and how they think, especially in some of these [dangerous] edge cases,” Jack Clark, co-founder of Anthropic, told the Financial Times, who highlighted the potential for systems to be used to assist the development of biological weapons.








