02 — Hardware Guide: What Do You Actually Need?

The Most Important Thing to Know

VRAM is the bottleneck, not compute.

A model running on a 5-year-old RTX 3060 at Q4 quantization gives you 96% of the quality of the same model on an A100 — just slower. And "slower" for most use cases (chat, coding, document analysis) still means 20-40 tokens per second, which is faster than most people read.

The Quick Decision Tree