Here is this year's work, layer by layer, top down.

01Frontier agentsAgents that do real work — measured on tasks you can't fake your way through.

DSGym1,000+ tasks across 10+ domains

ThunderAgentUp to 3.6× faster agent inference

TTT-DiscoverBeats best human, open model