AI Just Isn’t Right

Nearly half of Americans say they use AI to find information and generate ideas. It’s not hard to see why. As social media devolves into slop—and Google into a glorified landing page for Reddit threads and content farms—most of us are starved for something reliable. Plus, chatbots are so helpful, aren’t they? The first time I interacted with one, I asked if it knew it was a huge drain on resources. Half an hour later, I had a new recipe for vegan cream cheese.I never tried the recipe. Instead, I found a human-created one that the LLM might have scraped. That’s the way these models work, of course. They repackage collective knowledge into something that feels tailored to you. This may be OK for dairy alternatives (unless you’re a vegan blogger). But on the order of the world, and truth—the focus of my role as a fact-checker at WIRED—the stakes are exponentially higher.Over the past year or so, more and more people have looked at me with great pity. Surely a fact-checker at a magazine isn’t long for this AI-upgraded world. Call me foolish, but I’m not that worried. Very little of humanity’s collective knowledge, I’ve concluded, lives on the internet. And according to my research, AI is even more wrong than people might think.Tom Wolfe evidently thought of fact-checkers, according to the writer Colin Dickey, as a “cabal of women and middling editors all collaborating to henpeck and emasculate the prose of the Great Writer.” As definitions go, it’s not bad (though my boss and many colleagues are men). What can I say? It’s our job, unlike AI’s, to be annoying.WIRED’s fact-checking department is old-school: meticulous line-by-line annotations, primary sources whenever possible, and a broader-scale ethical and legal review. We question basic assumptions, look for new or conflicting information, call and talk to people—make sure. It’s a quick-hit peer review, functioning as best it can at the same pace as the news itself.As far as I can tell, AI hasn’t come for this process quite yet. What it has come for is “post hoc” fact-checking, the Snopes-style analysis of something’s factuality after the fact. In the UK, an initiative called Full Fact has built out its own AI tools to help thwart the spread of misinformation. These tools, used in more than 40 countries, process huge volumes of data, from social media posts to podcast transcripts, then pinpoint specific claims that humans can investigate further. “You definitely need a human being,” says Mark Frankel, Full Fact’s head of public affairs.The reason for that is simple: AI still gets things wrong. As a fact-checker, I’d love to be able to tell you exactly how often. But it’s not so easy. Since 2018, nearly 17,000 papers have been posted to arXiv on LLMs, many focused specifically on the question of their reliability. Still, it’s worth trying to pin down a working figure.In any article that comes across WIRED’s fact-checking desk, there’s usually a decent amount of “b-matter”: statistics, news events, quotes, anything that helps contextualize the topic. Fact-checkers tend to Google this basic information, and that process, in the form of the search engine’s dreaded AI Overviews, constitutes my main interaction with AI. In my professional opinion, it’s unusable—wrong—about a third of the time.This might be a generous assessment, though. A March 2025 study from the Tow Center for Digital Journalism found that more than 60 percent of responses from AI-powered search engines were inaccurate. A BBC study puts the wrongness of chatbots closer to 45 percent, the number I see cited more often. Because percentages are distancing, let me put this more plainly: AI could be wrong about half the time.Does it matter which model? Elon Musk has said Grok is the smartest, but I haven’t seen much research that agrees. Claude led the pack in RealFactBench, a fact-checking-focused benchmark test developed by computer scientists in China and the UK last year. It scored 73 percent accuracy across all metrics. (To be fair, Grok was not assessed.) Another benchmark, SimpleQA, developed by OpenAI in October 2024, posed more than 4,000 single-answer questions to models from OpenAI and Anthropic. None of the models exceeded 50 percent accuracy. Google updated the benchmark earlier this year, winnowing the question set to 1,000. Gemini 2.5 Pro came out on top, with 55.6 percent accuracy.Then there’s the models’ own assessments. When I asked ChatGPT how accurate the major LLMs are, it told me that most models had 90 to 96 percent accuracy on some professional-style tests. It then offered a link, confusingly, to a paper on a sleep medicine certification exam. On “general real-world questions,” it simply offered me the rate at which models like it have been shown to hallucinate: 1 to 2 percent, apparently, though when I tried to click through to that referenced source, it didn’t exist.Some say the models are getting smarter, but this doesn’t necessarily mean fewer hallucinations. In fact, it could mean more, a kind of overcompensation rooted ineradicably in their programmed need to please users. In a 2025 report on the future of AI by the Association for the Advancement of Artificial Intelligence, 60 percent of surveyed researchers doubted that the “factuality” problem would be solved anytime soon.When would-be fact-checkers apply for a position, most are given a test. In my case, the test involved a story about an alleged robocalling kingpin, and I was tasked with writing a memo detailing how I’d go about checking the piece for accuracy. At the end, three quick-fire bonus questions aimed to suss out how I’d handle individual facts.Recently, I dug out that old test and gave it to (the free versions of) ChatGPT, Claude, Gemini, and Grok.Grok came out of the ether like I was interrupting its supper: “Yes, I know exactly what fact checking is.” OK. It talked a lot about bias and put “credible” and “truth” in very loud quotation marks. It was also obsessed with data, along with gathering and analyzing more data than would ever be practicable or possible for a working fact-checker. It did, somewhat to my surprise, point out that fact-checking was historically women’s work.Claude and Gemini did pretty well. They understood the task, laid out a reasonable approach, even flagged potential legal issues. Gemini did give me this very cringe phrase: I would look for “Paper Trails” to back up the “People Trails.”ChatGPT seemed overeager and insecure. It spoke in buzzwords and generalizations. The approach it laid out seemed very time-consuming (including building a fact-checking grid where each sentence was broken apart and diagrammed). It offered to show me how it would “mark it up,” exactly “like a professional fact checker.” It then generated a paragraph that didn’t exist in the story. We tried that for a while, and then it offered to check a real paragraph for me. I gave it a fairly googleable selection, but it didn’t actually check any facts. None of the models did. They all gave me a plan of attack, told me exactly what they would do, and then stopped short of actually doing it.“I don’t think it’s an option to sit AI out as some kind of fad or something that won’t dramatically impact how people find information,” says Angie Holan, head of the International Fact-Checking Network, a Poynter initiative that connects more than 170 fact-checking organizations across the world. Holan says she finds herself more comfortable with AI than some of her colleagues are. If a model leads you to authoritative sources that you are able to verify yourself, there you go, she says. Fact-checkers, journalists, librarians, archivists—all should be engaging with these models, learning how they’re put together: “That way you can understand the strengths and weaknesses of these tools,” she says.I don’t disagree. In fact, the more time I spend with AI, the more capable I feel as a human fact-checker.Once we get past the googleable b-matter, my job really gets fun. It’s why I still get a thrill when I find some bit of information that doesn’t exist on the internet—a particular sign at a border crossing, the rates of kelp growth in two different climates, whether or not there was a Burger King at a particular LA intersection in 1979. AI systems can’t stay on the phone with a widow for over an hour because asking difficult questions turned on a fountain of grief that needed care and human receptivity. It can’t suss out that there’s beef between two sources which may be blurring the edges of what counts as “factual.” It can’t tell that an email with the phrase “Thanks for your email!” may, perhaps, be passively hostile.Most physical media in the world remains offline. In Lost in Time: Our Forgotten and Vanishing Knowledge, Jack Bialik points out that the technologies and knowledge bases we assumed were recent are actually in many cases millennia old (assembly lines, cataract surgery, even batteries). “Perhaps even more sobering is the realization that our storage technologies are far more likely to succumb to deterioration and useful obsolescence than hieroglyphics or ancient Sanskrit carved in a pyramid or on a temple wall,” he writes.Years ago, during a fact-checking assignment, I talked to the sci-fi writer and history professor Ada Palmer, who told me what she often tells her students: We know less than 1 percent of what happened 500 years ago, and two-thirds of what we know is wrong. Knowledge exists on a timeline too, and the work of generations is carrying on that knowledge without little bits slipping through and getting lost. Are we really OK entrusting our legacy to a bunch of distributed servers, operated by microchips with lifespans of 5 to 10 years?One final thing that I’ve been ignoring, which is so very human of me, is that humans make mistakes too. As Holan reminded me, abstaining from chatbots isn’t some foolproof saving grace. At least, I’m 33 to 90 percent sure that’s what she said. At the end of our interview, when I looked down at my recorder, I found I’d forgotten to turn it on.What Say You?Let us know what you think about this article in the comments below. Alternatively, you can submit a letter to the editor at [email protected].

AI Just Isn’t Right

AI Just Isn’t Right

Other newsrooms on this story

Related reading

AI and the news: How it helps, fails, and why that matters - 360

The consequences of relying on AI for accurate news

AI can't handle the truth

People are getting their news from AI -- and it’s altering their views - UPI.com

This chatbot wants to solve AI’s news problem | CNN Business

Over-reliance on chatbots can diminish critical-thinking skills, study finds

Other newsrooms on this story

Related reading

AI and the news: How it helps, fails, and why that matters - 360

The consequences of relying on AI for accurate news

AI can't handle the truth

People are getting their news from AI -- and it’s altering their views - UPI.com

This chatbot wants to solve AI’s news problem | CNN Business

Over-reliance on chatbots can diminish critical-thinking skills, study finds