This stack uses Ollama with Gemma 4 QAT to run a 12B model on a 10GB VRAM laptop GPU. The latest Gemma 4 QAT checkpoints reduce memory usage and enable compact local inference.
What you get
Local Gemma 4 12B inference on 10GB VRAM hardware
QAT compression that fits the model into ~6.7 GB VRAM
A laptop-friendly private AI stack for writing, notes, and prompts







