AI coding agents find the right file but miss the exact lines that matter, study shows

AI coding agents like Claude Code or Codex reliably find the right file but miss most of the critical lines within it. The new SWE-Explore benchmark is the first to test code search separately from the actual repair, and it shows that without enough context, even the best fix will fail.

domenica 14 giugno 2026 New tab

A new benchmark separates code search from the actual fix and exposes a hidden weakness of AI coding agents. They land in the right neighborhood but miss the crucial spots.

Until now, AI coding has mostly been judged by the result. Did the agent fix the bug or not? That single metric hides what actually went wrong. Maybe the agent never read the relevant code. Maybe it saw the correct file and still wrote the wrong patch. Either way, the outcome looks the same.

An international research team involving Shanghai Jiao Tong University is tackling this blind spot with SWE-Explore. The benchmark only evaluates the first phase of the process. An agent receives a bug description and a software project, then returns a ranked list of code sections it considers relevant.

Conventional benchmarks measure only the repair rate and don't reveal whether an agent even read the relevant code. SWE-Explore isolates this upstream search phase. | Image: Zhang et al.

Successful runs set the reference

A new benchmark separates code search from the actual fix and exposes a hidden weakness of AI coding agents. They land in the right neighborhood but miss the crucial spots.

Conventional benchmarks measure only the repair rate and don't reveal whether an agent even read the relevant code. SWE-Explore isolates this upstream search phase. | Image: Zhang et al.

Successful runs set the reference

AI coding agents find the right file but miss the exact lines that matter, study shows

AI coding agents find the right file but miss the exact lines that matter, study shows

Other newsrooms on this story

Related reading

LocalityLens: Why Your AI Coding Agent Gets Lost in Your Codebase

AI Code Quality Is Not Repo Truth

AI Coding Agents Search Like It's 2009. Provenant Cuts Tokens by 65 .

Your AI Coding Agent Wastes 80% of Its Context. Fixed That with Graph Theory.

AI Coding Agents Search Like It's 2009. Provenant Cuts Tokens by 65x.

What kinds of repo-truth boundaries do AI coding agents miss?

Related reading

LocalityLens: Why Your AI Coding Agent Gets Lost in Your Codebase

AI Code Quality Is Not Repo Truth

AI Coding Agents Search Like It's 2009. Provenant Cuts Tokens by 65 .

Your AI Coding Agent Wastes 80% of Its Context. Fixed That with Graph Theory.

AI Coding Agents Search Like It's 2009. Provenant Cuts Tokens by 65x.

What kinds of repo-truth boundaries do AI coding agents miss?

Other newsrooms on this story