AI search agents often confirm what they already know instead of actually researching the web

Leading AI search agents like GPT-5.4 and Kimi K2.6 don't appear to do much actual research on established benchmarks. They mostly just use the web to confirm what they already learned during training. Researchers at the Harbin Institute of Technology found this using a new time-based benchmark called LiveBrowseComp, which only asks about events from the last 90 days. Once the models can't fall back on memory, performance falls apart and the existing rankings get reshuffled.

domenica 31 maggio 2026 New tab

May 31, 2026

Nano Banana Pro prompted by THE DECODER

A new study suggests that leading AI search agents don't actually research on established benchmarks; they mostly use the web to confirm answers they already have. Once models have to go beyond their existing knowledge, search performance falls apart.

Frontier models like GPT-5.4, Gemini 3.1 Pro, Claude Sonnet 4.6, DeepSeek-V4-Pro, and Kimi-K2.6 keep posting higher scores on BrowseComp. The benchmark asks agents complex questions that can only be answered through multi-step browsing and piecing together information from different web sources.

Researchers from the Harbin Institute of Technology and Xiaohongshu have now shown in a study that these results say less about the agents' research skills than assumed. The authors call it "intrinsic knowledge dependence" (IKD), a reliance on internal knowledge the models absorbed during training.

May 31, 2026

Nano Banana Pro prompted by THE DECODER

AI search agents often confirm what they already know instead of actually researching the web

AI search agents often confirm what they already know instead of actually researching the web

Other newsrooms on this story

Related reading

AI search agents don't fail at searching, they fail at asking the right…

There Is No Best AI Model in 2026 — And That's Actually Good News

Why AI benchmarks are broken - TechTalks

Perplexity's "Search as Code" lets AI models write their own search pipelines…

AI Coding Agents Search Like It's 2009. Provenant Cuts Tokens by 65x.

Chinese AI models are learning to detect safety tests and adjust their…

Other newsrooms on this story

Related reading

AI search agents don't fail at searching, they fail at asking the right…

There Is No Best AI Model in 2026 — And That's Actually Good News

Why AI benchmarks are broken - TechTalks

Perplexity's "Search as Code" lets AI models write their own search pipelines…

AI Coding Agents Search Like It's 2009. Provenant Cuts Tokens by 65x.

Chinese AI models are learning to detect safety tests and adjust their…