How accurate are AI chatbots when we ask them a medical question?

When we use a search engine such as Google to look up information, it is now standard to see an artificial intelligence (AI) answer at the top of the results page. It is a measure of how much AI is part of our lives.But how accurate are AI chatbots when they are asked a medical question?A couple of recent research papers have come up with some disturbing answers. The authors of one study, published in BMJ Open, put five of the world’s most popular chatbots through a systematic health-information stress test.The chatbots – ChatGPT, Gemini, Grok, Meta AI and DeepSeek – were each asked 50 health and medical questions covering topics such as cancer, vaccines, stem cells and nutrition. Two experts independently rated every answer. They found that nearly a fifth of the answers were highly problematic, half were problematic, and 30 per cent were somewhat problematic. None of the chatbots produced fully accurate reference lists. Chatbot performance varied by topic. They handled vaccines and cancer best – fields with large, well-structured bodies of research – yet still produced problematic answers roughly a quarter of the time.Significantly, the chatbots struggled most with open-ended questions – 32 per cent of those answers were rated highly problematic, compared with just 7 per cent for closed questions. Most health queries people ask are open-ended. They tend not to ask chatbots true-or-false questions.When the researchers asked each chatbot for 10 scientific references, none managed a single fully accurate reference list in 25 attempts. A separate study published in Nature Medicine also found chatbot answers to be incomplete – but with an interesting twist. Dr Rebecca Payne, a GP and senior clinical lecturer at Bangor University in Wales, and colleagues gave participants brief descriptions of common medical situations. They were randomly assigned either to use one of three widely available chatbots or to rely on whatever sources they would normally use at home.After interacting with the chatbot, they were asked two questions: what condition might explain the symptoms? And where should they seek help?People who used chatbots were less likely to identify the correct condition than those who didn’t. They were also no better at determining the right place to seek care than the control group. In other words, interacting with a chatbot did not help people make better health decisions. When the researchers then removed the human element and gave the same scenarios directly to the chatbots, their performance improved dramatically. Without human involvement, the models identified relevant conditions in the vast majority of cases and often suggested appropriate levels of care.[ The chatbot will see you now: is this the future of Irish medicine?Opens in new window ]So why did the results deteriorate when people actually used the systems?Chatbots frequently mentioned the relevant diagnosis somewhere in the conversation, yet participants did not always notice or remember it when summarising their final answer. In other cases, users provided incomplete information or the chatbot misinterpreted key details. According to Dr Payne, the issue was not simply a failure of medical knowledge – it was a failure of communication between human and machine. Writing about her research in The Conversation, Payne says the lesson from her study is not that AI has no place in healthcare. Rather the key is understanding what these systems are currently good at and where their limitations lie.[ Doctors’ reliance on AI tools could erode critical thinking, experts warnOpens in new window ]“One useful way to think about today’s chatbots is that they function more like secretaries than physicians. They are remarkably effective at organising information, summarising text and structuring complex documents. These are the kinds of tasks where language models are already proving useful within healthcare systems, for example in drafting clinical notes, summarising patient records or generating referral letters.”The role of AI in medicine is likely to be more supportive than revolutionary in the near term. Chatbots should not be expected to act as the front door to healthcare. They are simply not ready to diagnose conditions or direct patients to the right level of care. mhouston@irishtimes.com

How accurate are AI chatbots when we ask them a medical question?

How accurate are AI chatbots when we ask them a medical question?

Other newsrooms on this story

Related reading

We are AI experts. Here are the dangers of using chatbots for medical…

Can I trust health advice from an AI chatbot?

Should you ask ChatGPT for medical advice?— Harvard Gazette

Can AI diagnose your health? Why doctors say ChatGPT should never replace a…

Doctors Are Worried About AI. They Use It Anyway.

AI chatbots give inaccurate medical advice says Oxford Uni study

Other newsrooms on this story

Related reading

We are AI experts. Here are the dangers of using chatbots for medical…

Can I trust health advice from an AI chatbot?

Should you ask ChatGPT for medical advice?— Harvard Gazette

Can AI diagnose your health? Why doctors say ChatGPT should never replace a…

Doctors Are Worried About AI. They Use It Anyway.

AI chatbots give inaccurate medical advice says Oxford Uni study