By Vilius Vystartas | May 2026

Ten more models through the same 10 agent coding tasks. Two tied the all-time record. One cost $0.0002. The other hit the score at $0.0018 — cheaper than most models scoring 70%.

Batch 10 was the cheapest one yet.

The Leaders

Two models scored 90% with zero hard fails, joining MiniMax M2 Her and Baidu Ernie 4.5 300B as the highest-scoring models on this benchmark: