Nature Medicine study finds general-purpose LLMs outperform dedicated medical AI tools

A study published June 12, 2026, in Nature Medicine found that general-purpose large language models consistently outperformed dedicated clinical AI products across standardized medical tasks. The general-purpose models were also preferred by the clinicians using them.

What the study actually tested

The researchers pitted three major general-purpose LLMs against purpose-built medical tools. On one side: OpenAI’s GPT-5.2, Google’s Gemini 3.1 Pro Preview, and Anthropic’s Claude Opus 4.6. On the other: dedicated clinical products like OpenEvidence and UpToDate Expert AI, tools specifically designed and marketed for healthcare professionals.

The battleground included MedQA questions, a well-established benchmark for evaluating medical knowledge drawn from medical licensing exams. The general-purpose models excelled across these tasks, beating the specialists on their home turf.

Google Search AI Overview was included as a control, representing the kind of quick-reference tool physicians actually reach for during a busy shift.

What the study actually tested

Google Search AI Overview was included as a control, representing the kind of quick-reference tool physicians actually reach for during a busy shift.

Nature Medicine study finds general-purpose LLMs outperform dedicated medical AI tools

Nature Medicine study finds general-purpose LLMs outperform dedicated medical AI tools

Other newsrooms on this story

Related reading

Just add humans: Oxford medical study underscores the missing link in chatbot…

Towards autonomous medical artificial intelligence agents - Nature

Google's AMIE matches primary care physicians in complex disease management,…

There Is No Best AI Model in 2026 — And That's Actually Good News

21 LLMs tuned for special domains

Gemini 3 Just Scored 100% On A Critical Test All Other AI Models Fail

Other newsrooms on this story

Related reading

Just add humans: Oxford medical study underscores the missing link in chatbot…

Towards autonomous medical artificial intelligence agents - Nature

Google's AMIE matches primary care physicians in complex disease management,…

There Is No Best AI Model in 2026 — And That's Actually Good News

21 LLMs tuned for special domains

Gemini 3 Just Scored 100% On A Critical Test All Other AI Models Fail