Running Brand-New Gemma 4 12B on an 8-Year-Old GTX 1080 Ti: Speed, 3 Gotchas, and Why Q8 Beat Q4 on My Own Field

TL;DR (Quick Answer)

Gemma 4 12B just dropped, so I ran it on a GTX 1080 Ti (Pascal, 2017) to see what an 8-year-old card does with a 2026 model. Real numbers, and a few honest surprises:

Speed: ~28 tok/s at Q4_K_M on a single 1080 Ti (~8 GB VRAM). The 12B fits one card, so the second GPU sits idle.

Three things broke before it worked: the GGUF is multimodal and its vision projector crashes Ollama; it's a reasoning model that hides its answer in a thinking channel; and Q4 produces visible token glitches.

The interesting part — Q4 vs Q8. I asked it real bioinformatics questions. At Q4 it answered concepts and code well but got a niche method (the HEIDI test) confidently backwards, with garbled characters sprinkled in. Going to Q8_0 (12.7 GB, split across both 1080 Tis, ~30% slower at ~19.5 tok/s) removed the glitches and fixed the wrong answer.

TL;DR (Quick Answer)

Gemma 4 12B just dropped, so I ran it on a GTX 1080 Ti (Pascal, 2017) to see what an 8-year-old card does with a 2026 model. Real numbers, and a few honest surprises:

Speed: ~28 tok/s at Q4_K_M on a single 1080 Ti (~8 GB VRAM). The 12B fits one card, so the second GPU sits idle.

Running Brand-New Gemma 4 12B on an 8-Year-Old GTX 1080 Ti: Speed, 3 Gotchas, and Why Q8 Beat Q4 on My Own Field

Running Brand-New Gemma 4 12B on an 8-Year-Old GTX 1080 Ti: Speed, 3 Gotchas, and Why Q8 Beat Q4 on My Own Field

Other newsrooms on this story

Related reading

I stress-tested Gemma 4 E4B's 128K context on a laptop GPU — recall is great,…

Running Gemma 4 26B on an Old GTX 1080 with llama.cpp

Gemma 4 QAT on 10GB Laptop: Local AI with 6.7GB VRAM

I Ran Every Gemma 4 Model on My Home Lab. E4B Crushes E2B. Here's the Data.

The Delusion of Infinite Compute: Running Gemma 4 on an i5 CPU

Google's new Gemma 4 12B model is designed to run on any laptop with 16GB of RAM

Other newsrooms on this story

Related reading

I stress-tested Gemma 4 E4B's 128K context on a laptop GPU — recall is great,…

Running Gemma 4 26B on an Old GTX 1080 with llama.cpp

Gemma 4 QAT on 10GB Laptop: Local AI with 6.7GB VRAM

I Ran Every Gemma 4 Model on My Home Lab. E4B Crushes E2B. Here's the Data.

The Delusion of Infinite Compute: Running Gemma 4 on an i5 CPU

Google's new Gemma 4 12B model is designed to run on any laptop with 16GB of RAM