AI models often give the right answers but point to the wrong sources

Leading AI models like GPT and Gemini routinely cite text passages in document analyses that don't actually support their answers. Even when the answer is right, the cited evidence is often wrong. Researchers at Peking University call this "attribution hallucination," a risk for regulated fields like law and medicine. Their new CiteVQA benchmark is the first to test for it systematically.

lunedì 25 maggio 2026 New tab

Just because a language model nails a question about a PDF doesn't mean it actually found the answer where it claims to.

Researchers at Peking University and the Shanghai Artificial Intelligence Laboratory built a new benchmark called CiteVQA to expose this gap between getting the right answer and pointing to the right source. They call it "attribution hallucination."

CiteVQA checks both the answer and the source location. A correct answer paired with a wrong citation gets an SAA score of 0 - only a correct citation counts. | Image: Ma et al.

Standard document analysis tests like DocVQA or MMLongBench-Doc only grade the final answer. They can't tell whether a model actually pulled information from the document or just guessed based on what it already knew. In law, financial audits, or medicine, though, traceability is what makes an AI output usable in the first place, the paper argues.

Pinpointing evidence

Just because a language model nails a question about a PDF doesn't mean it actually found the answer where it claims to.

CiteVQA checks both the answer and the source location. A correct answer paired with a wrong citation gets an SAA score of 0 - only a correct citation counts. | Image: Ma et al.

Pinpointing evidence

AI models often give the right answers but point to the wrong sources

AI models often give the right answers but point to the wrong sources

Other newsrooms on this story

Related reading

How AI answer engines decide which sources to cite

AI assistants lie about citations. Here's how to catch them.

AI models misrepresent news events nearly half the time, study says

AI Citations: how ChatGPT, Claude, Gemini cite sources

RAG Explained for Beginners: How AI Assistants Stop Making Things Up

AI Models Can’t Agree on Basic Facts Most of the Time, Study Shows - Decrypt

Other newsrooms on this story

Related reading

How AI answer engines decide which sources to cite

AI assistants lie about citations. Here's how to catch them.

AI models misrepresent news events nearly half the time, study says

AI Citations: how ChatGPT, Claude, Gemini cite sources

RAG Explained for Beginners: How AI Assistants Stop Making Things Up

AI Models Can’t Agree on Basic Facts Most of the Time, Study Shows - Decrypt