By Vilius Vystartas | May 2026
Ten more models through the same 10 agent coding tasks. Two tied the all-time record. One cost $0.0002. The other hit the score at $0.0018 — cheaper than most models scoring 70%.
Batch 10 was the cheapest one yet.
The Leaders
Two models scored 90% with zero hard fails, joining MiniMax M2 Her and Baidu Ernie 4.5 300B as the highest-scoring models on this benchmark:










