Artificial intelligence systems can now pass the Turing Test – better than a human being, a study has found.The Turing Test – founded by the pioneering mathematician, who called it the Imitation Game – is an experiment that is intended to check whether a machine can show the same intelligence as a human being. It has a human talk to unseen participants and see whether they can distinguish between human ones and artificial ones.Now a new study claims to have been the first to rigorously use modern large language models in a test. And it found that people thought the models were significantly more human than the real people who were part of the experiment.The study looked at four different leading large language models – including the latest ones powering ChatGPT and Meta’s LLaMa – as well as older systems. They were compared with real human beings.In the study, the latest GPT-4.5 was thought to be human 73 per cent of the time, much more than the real people. The latest LLaMa was thought to be human 56 per cent of the time, almost the exact same as the people it was being compared with.The experiments with older models showed how quickly the systems have advanced. GPT-4o – first released in 2024 – was thought to be human 21 per cent of the time, while the ELIZA system from the 1960s scored as human 23 per cent of the time.“What we found is that if given the right prompts, advanced LLMs can exhibit the same tone, directness, humor and fallibility as humans,” said the study’s corresponding author Cameron Jones. “While we know LLMs can easily produce knowledge on nearly every topic, this test showed that it can also convincingly display social behavioural traits, which has major implications for how we think of AI.” The study could lead to a rethinking of the Turing Test, researchers suggested.“The Turing test started as a way to ask whether machines could rival human intelligence,” said study coauthor Ben Bergen, a professor of cognitive science at the University of California San Diego. “But now we know AI can answer many questions faster and more accurately than people can, so the real issue isn’t raw brainpower.“Seeing that machines can pass the test — and seeing how they pass it — forces us to rethink what it measures. Increasingly, it’s measuring humanlikeness.”The research also showed the importance of prompts in creating convincing chatbots. Each of the systems was instructed to adopt a persona, or a specific character and communication style, and that worked partly by leading the systems to make mistakes in the same way a human would, the researchers suggested.Without those prompts, the models were far more likely to be caught out. GPT-4.5 was seen as human only 36 per cent of the time without explicit instructions, the researchers showed.“They have the ability to appear human-like, but maybe not as much the ability to figure out what it would take to appear human-like,” said Professor Bergen.The work is reported in a new paper, ‘Large Language Models Pass a Standard Three-Party Turing Test’, published in the journal Proceedings of the National Academy of Sciences.
AI can finally pass the Turing Test better than a human, study warns
Experiment should lead to a re-evaluation of how we understand whether people on the internet are really human, researchers suggest









