I Spent Two Weeks Pitting Qwen 3 Max Against DeepSeek V4

I want to tell you about a rabbit hole I fell into recently. It started the way most of my projects do — someone on a Discord server I frequent asked a simple question: "Should I use Qwen 3 Max or DeepSeek V4 for my internal_compare workflow?" I had opinions, sure, but I wanted real numbers. So I cleared my calendar, fired up a couple of GPU instances, and started benchmarking.

What I found surprised me, and it also reinforced something I've been saying for years: the open source ecosystem is winning, and the walled gardens of the proprietary AI world are starting to look pretty silly.

Let me walk you through what I learned, the actual numbers I got, and why I keep coming back to open weight models with permissive licenses (looking at you, Apache 2.0 and MIT).

Why I Care About This in the First Place