Snowflake compared GLM-5.2 and Opus 4.7 in a hands-on benchmark. The Chinese model held its own.

The test covered 103 tasks, each run three times, where models had to write code that works on both DuckDB and Snowflake. When each model got three attempts per task, the two were neck and neck: 66% vs. 67% of tasks solved.

First-attempt accuracy diverges: Opus hit 53.7%, GLM only 47.6%, showing GLM's output is less consistent. The Chinese model also averaged 99 runs per task versus Opus's 80 and burned through 860 million tokens, nearly double Opus's 439 million.

Opus 4.7 is the better model, but GLM is competitive in Snowflake's code benchmark and costs far less. | Image: via X[GLM's strength is validating code reliably across both platforms (DuckDB and Snowflake) at the same time. According to Snowflake CEO Sridhar Ramaswamy, that's why only GLM could solve certain tasks.

Its weaknesses are giving up too early and obsessively checking the wrong things. On one task, GLM fired off 411 tool calls in 24 minutes, checking row counts, distributions, null values, and column types, and still failed all three attempts. Opus solved the same task with 49 calls in 9 minutes.