This article was originally published on runaihome.com

TL;DR: Mistral Small 4 is a 119B MoE model with 6B active parameters per token—GPT-4-class in coding and reasoning, multimodal, and fully open-weight. The problem is Q4_K_M quantization lands at ~74 GB, so no single consumer GPU gets you there. Your two realistic local paths are three RTX 4090s (GPU cost alone: ~$3,300–5,500 depending on new vs. used) or a Mac Studio M3 Ultra with 96 GB ($3,999). For most readers, the Mistral API at $0.15/M input tokens removes all of this friction.

3× RTX 4090

Mac Studio M3 Ultra 96 GB

RunPod H100 PCIe