Lenz Research study finds AI models disagree on 67% of fact-check claims

Ask five of the world’s most advanced AI models whether something is true, and two-thirds of the time, at least one of them will disagree with the group. That’s the headline finding from a new study by Lenz Research, which tested GPT-5.4, Claude Opus 4.7, Gemini 3 Pro, Gemini 3 Pro + Search, and Sonar Pro on 1,000 real-world claims submitted by actual users to a fact-checking platform.

The results are sobering. Out of those 1,000 claims, 672, or 67%, produced at least one model that dissented from the panel majority. In English: if you’re treating any single AI model as your personal oracle of truth, you’re rolling the dice more often than you think.

The numbers behind the disagreement

Lenz Research didn’t just measure whether models agreed or disagreed in a binary sense. They looked at the depth of disagreement, too. A full 343 claims, roughly 34%, showed what the researchers call “substantive disagreements,” where the most-disagreeing pair of models landed two or more verdict categories apart on a scale that ranged from True to Mostly True to Misleading to False.

To quantify the overall level of agreement, the study used Krippendorff’s alpha, a standard statistical measure for inter-rater reliability. The score came in at 0.639 on an ordinal scale. For context, a score of 1.0 means perfect agreement, and most researchers consider anything below 0.667 to indicate only tentative conclusions should be drawn. The models, in other words, landed just below the threshold where social scientists would start feeling comfortable relying on the results.

The numbers behind the disagreement

Lenz Research study finds AI models disagree on 67% of fact-check claims

Lenz Research study finds AI models disagree on 67% of fact-check claims

Other newsrooms on this story

Related reading

AI Models Can’t Agree on Basic Facts Most of the Time, Study Shows - Decrypt

Other newsrooms on this story

Related reading

AI Models Can’t Agree on Basic Facts Most of the Time, Study Shows - Decrypt

AI models misrepresent news events nearly half the time, study says

The consequences of relying on AI for accurate news

People are getting their news from AI -- and it’s altering their views - UPI.com

Testing suggests Google's AI Overviews tell millions of lies per hour

Mistral AI models flagged for potential Russian propaganda influence in new…