Arbor framework outperforms Claude Code and Codex by 2.5x in AI optimization benchmarks

Researchers at Renmin University of China’s Gaoling School of Artificial Intelligence and Microsoft Research released Arbor on June 10, 2026, an open-source framework that outperformed both OpenAI’s Codex and Anthropic’s Claude Code by more than 2.5 times in average relative held-out gains across six autonomous optimization tasks. The framework also achieved the best held-out test results on every single task evaluated.

How Arbor actually works

Arbor uses Hypothesis-Tree Refinement (HTR), which organizes optimization work into a branching tree structure of hypotheses, experiments, evidence, and insights, where each branch builds on what came before rather than treating each attempt as a standalone experiment.

The architecture splits into two layers. A long-lived coordinator agent handles strategy, deciding which hypotheses are worth pursuing and how to sequence experiments. Short-lived executor agents then run those experiments in controlled environments. When an executor finishes its job and reports back, the coordinator absorbs the findings and refines its approach for the next round.

The benchmark numbers

How Arbor actually works

The benchmark numbers

Arbor framework outperforms Claude Code and Codex by 2.5x in AI optimization benchmarks

Arbor framework outperforms Claude Code and Codex by 2.5x in AI optimization benchmarks

Other newsrooms on this story

Related reading

AI optimizer beats Claude Code, Codex by 2.5x

Researchers grow a hypothesis tree for AI coding agents

$100K savings in 3 months: Techie says Claude code is now 2x faster, 3x cheaper…

OpenAI’s big Codex update is a direct shot at Claude Code

Microsoft's Fara1.5 AI outperforms OpenAI and Google in web tasks

Local AST scanner that reduces AI coding agent token costs

Other newsrooms on this story

Related reading

AI optimizer beats Claude Code, Codex by 2.5x

Researchers grow a hypothesis tree for AI coding agents

$100K savings in 3 months: Techie says Claude code is now 2x faster, 3x cheaper…

OpenAI’s big Codex update is a direct shot at Claude Code

Microsoft's Fara1.5 AI outperforms OpenAI and Google in web tasks

Local AST scanner that reduces AI coding agent token costs