Qwen 3.6 35B-A3B for Local AI in 2026: The 24GB VRAM Line That Gets You 120 tok/s

giovedì 11 giugno 2026 New tab

1,204 words~5 min read

This article was originally published on runaihome.com

TL;DR: Qwen 3.6 35B-A3B is a Mixture-of-Experts model that costs only 3B parameters of compute per token — but all 35B must live in VRAM simultaneously, setting a hard 24GB floor. On a 24GB RTX 4090 it reaches 120 tok/s at Q4_K_M; a used RTX 3090 gets 107 tok/s with Ollama, or 135.7 tok/s with a properly tuned llama.cpp config. Anything with 16GB VRAM needs CPU offloading that cuts speed by half.

RTX 4090 24GB

RTX 3090 24GB

Mac Mini M4 Pro 24GB

Qwen 3.6 35B-A3B for Local AI in 2026: The 24GB VRAM Line That Gets You 120 tok/s

Qwen 3.6 35B-A3B for Local AI in 2026: The 24GB VRAM Line That Gets You 120 tok/s

Related reading

Doubling Qwen3.6-27B on One RTX 3090: ollama llama.cpp + MTP, Lever by Lever…

Qwen3-Coder-Next for Local AI in 2026: Which GPU Can Actually Run Alibaba's #1…

Qwen 3.6 27B and 35B MTP vs Standard on 16GB GPU

Qwen Is Not Yet Ready to Power Local OpenClaw Deployments

Alibaba's new open source Qwen3-235B-A22B-2507 beats Kimi-2 and offers low…

Kimi K2.6 for Local AI in 2026: What VRAM and System RAM You Need to Actually…

Related reading

Doubling Qwen3.6-27B on One RTX 3090: ollama llama.cpp + MTP, Lever by Lever…

Qwen3-Coder-Next for Local AI in 2026: Which GPU Can Actually Run Alibaba's #1…

Qwen 3.6 27B and 35B MTP vs Standard on 16GB GPU

Qwen Is Not Yet Ready to Power Local OpenClaw Deployments

Alibaba's new open source Qwen3-235B-A22B-2507 beats Kimi-2 and offers low…

Kimi K2.6 for Local AI in 2026: What VRAM and System RAM You Need to Actually…