The last six weeks produced one of the densest model release windows in AI history. OpenAI shipped GPT-5.4 with native computer use and a 1M context window. Anthropic shipped Claude Opus 4.6 with the strongest expert task performance scores anyone has measured. Google shipped Gemini 3.1 Pro at $2 per million input tokens, undercutting both. DeepSeek dropped V4 with 1 trillion parameters at less than a tenth the price of frontier models. Mistral, MiniMax, and Alibaba all released models that beat last year's flagships.
If you're a developer trying to pick "the best model" right now, you've probably noticed something strange. Every comparison article picks a different winner. Every benchmark tells a different story. Every Twitter thread argues for a different model.
That's because there is no best model. And after building an AI proxy that routes across all three major providers, I've come to think that's actually the better outcome.
The current landscape — who wins what
Let me walk through the actual numbers, because the marketing pages bury them.









