Three weeks ago I ran a model showdown — twelve tasks, five models, one RTX 5090 — and Qwen3.5-35B-A3B won. 85.3 weighted score, 206 tok/s, fits in VRAM with room to spare. I switched it to the default and figured I was done.

I was not done.

This is what two weeks of actually living with Qwen looked like: the config work I had to do before it was usable, the incident that almost killed the experiment, and the ergonomic gap that means frontier models still own my serious work.

Making It Actually Work

The first day I switched Qwen to the default model in OpenClaw, something was wrong. Responses showed raw <think>...</think> tags in the visible output. Tool calls came back as plain text — create_workspace, just sitting there — instead of proper OpenAI-compatible tool_calls objects. The bot was trying to call tools. It just wasn't calling them.