By Vilius Vystartas | May 2026

I tested another 10 models across the same 10 agent coding tasks. Four of them were free-tier models — and the range was absurd: Owl Alpha scored 76.7% with zero hard fails, Laguna M.1 scored 10% and produced garbage on 9 out of 10 tasks. The free tier is not free if it costs you debugging time.

Total cost for all 10 models: $0.10. The paid models (6 of 10) came to $0.10 combined.

Batch 12 Leaderboard

#