Fable 5 didn't win.
I need to say that up front because the timing of this post is going to make it sound like a very different story. Yes, we benchmarked Claude Fable 5 on our homelab harness. Yes, the US government suspended it about three hours later. But the actual result? Fable 5 scored 89.3. Opus 4.8 scored 91.9. The model everyone's eulogizing right now lost to a model you can still use today.
That's the real story. The suspension is just what makes it weird.
What We Tested
This is Round 6 of our homelab bakeoff series — but with a twist. Rounds 1 through 5 tested quantized local models on an RTX 5090 via llama.cpp. This time we pointed the same task suite at four frontier cloud models:















