On the Apex Math Reasoning benchmark, Qwen3.7-Max scored 44.5, eclipsing Claude Opus-4.6 Max's score of 34.5 and DeepSeek V4-Pro Max's 38.3.

Alibaba's Qwen 3.7 Max landed on Arena AI five days before the Cloud Summit and earned its spot. We tested it, and here are the results.

Alibaba's Qwen 3.7 Max-Preview ranks 13th globally in text and 16th in vision on LM Arena, focusing on math, coding, and reasoning capabilities.