This stack uses Ollama with Gemma 4 QAT to run a 12B model on a 10GB VRAM laptop GPU. The latest Gemma 4 QAT checkpoints reduce memory usage and enable compact local inference.

What you get

Local Gemma 4 12B inference on 10GB VRAM hardware

QAT compression that fits the model into ~6.7 GB VRAM

A laptop-friendly private AI stack for writing, notes, and prompts