The benchmark that's getting my attention

A Reddit thread in r/LocalLLaMA this week is buzzing about Qwen3.7 Max getting scored on Artificial Analysis, with the open-weight 27B and 35B variants reportedly still in the "waiting room." I haven't tested 3.7 Max myself yet — and frankly, I'd take any single benchmark score with a fistful of salt — but it's worth talking about how I think about picking and migrating between LLMs.

I've been moving inference workloads between providers for the last 18 months. Three different production projects. Some lessons cost me real money. Here's what I've learned about comparing closed APIs to open-weight models, with code you can actually use.

Why the open-weight question even comes up

When I started, every project just hit a closed API and called it done. Reasonable default. But three things kept pushing me toward open-weight alternatives: