By Vilius Vystartas | May 2026
I tested another 10 models across the same 10 agent coding tasks. Four of them were free-tier models — and the range was absurd: Owl Alpha scored 76.7% with zero hard fails, Laguna M.1 scored 10% and produced garbage on 9 out of 10 tasks. The free tier is not free if it costs you debugging time.
Total cost for all 10 models: $0.10. The paid models (6 of 10) came to $0.10 combined.
Batch 12 Leaderboard
#









