AI coding agents can tend to isolate research, running experiments and generating ideas that are then forgotten when context windows reset. This can waste tokens, as models then repeat the same mistakes and hit the same dead ends.
But new research argues that it’s not the model itself, but the overarching ‘tree,’ that needs tweaking. To that end, data scientists from the Gaoling School of Artificial Intelligence, Renmin University of China, and Microsoft Research have introduced Arbor, a “persistent hypothesis tree” that helps agents remember and refine learnings over long research sessions.
A long-lived coordinator manages research strategy across the tree, while short-lived executors spin up isolated worktrees to test different hypotheses. As results come back, the tree updates, narrowing and refining throughout experimentation.
In practical tests, this technique delivered more than two-fold performance gains over standard AI coding agents across real-world engineering tasks, for the same budget.
This is because, said Mahmoud Ramin, a research director at Info-Tech Research Group, “Arbor accumulates information over time and allows agents to build upon prior discoveries just as humans do, through learning, adaptation, and eventually building upon what they have learned in the past.”











