10 Models Tested: From 81.6% to 10%. The Free Tier is a Full-On Gamble.

By Vilius Vystartas | May 2026 I tested another 10 models across the same 10 agent coding tasks....

mercoledì 27 maggio 2026 New tab

867 words~4 min read

By Vilius Vystartas | May 2026

I tested another 10 models across the same 10 agent coding tasks. Four of them were free-tier models — and the range was absurd: Owl Alpha scored 76.7% with zero hard fails, Laguna M.1 scored 10% and produced garbage on 9 out of 10 tasks. The free tier is not free if it costs you debugging time.

Total cost for all 10 models: $0.10. The paid models (6 of 10) came to $0.10 combined.

Batch 12 Leaderboard

Other newsrooms on this story

· 1 sources

Full timeline →

decrypt.co·May 27, 2026 · 1 mesi fa
Huawei's New Benchmark Gives AI Agents Months of Your Life—Then Watches Them Fail - Decrypt

10 Models Tested: From 81.6% to 10%. The Free Tier is a Full-On Gamble.

Other newsrooms on this story

10 Models Tested: From 81.6% to 10%. The Free Tier is a Full-On Gamble.

Other newsrooms on this story

Related reading

Two Models Just Hit 90% on Agent Coding. One Cost Less Than a Penny.

Four frontier AI coding agents took the same build. The cheapest one won.

Every AI coding tool's free tier, compared — and what they won't tell you

Speed Test: I Found AI APIs 99% Cheaper Than Premium

The Bug Was in the Grader

Free vs Paid Vibe Coding Tools in 2026: What You Actually Get (and What You're…

Related reading

Two Models Just Hit 90% on Agent Coding. One Cost Less Than a Penny.

Four frontier AI coding agents took the same build. The cheapest one won.

Every AI coding tool's free tier, compared — and what they won't tell you

Speed Test: I Found AI APIs 99% Cheaper Than Premium

The Bug Was in the Grader

Free vs Paid Vibe Coding Tools in 2026: What You Actually Get (and What You're…