Google DeepMind Releases Gemma 4 QAT Checkpoints: Q4_0 and a New Mobile Format Cut On-Device Memory

Google DeepMind released Quantization-Aware Training (QAT) checkpoints for the Gemma 4 family. The release targets local deployment on edge devices and consumer GPUs. It follows the Gemma 4 launch in April and a 12B model two days earlier.

We compared the available Gemma 4 edge-model formats using only published numbers. The goal was simple. Show what each precision level costs in memory. Then show what QAT actually changes.

What QAT actually does

Quantization shrinks a model by lowering weight precision. Standard Post-Training Quantization (PTQ) compresses a finished model. That often degrades quality. QAT instead simulates quantization during training. The model learns to compensate for the precision loss.

Google’s AI team states its QAT results yield higher overall quality than standard PTQ baselines. Google did not publish Gemma 4 QAT benchmark scores in the announcement. For context, Gemma 3 QAT cut the Q4_0 perplexity drop by 54% using llama.cpp evaluation. We cite that only as prior-generation precedent.

We compared the available Gemma 4 edge-model formats using only published numbers. The goal was simple. Show what each precision level costs in memory. Then show what QAT actually changes.

What QAT actually does

Google DeepMind Releases Gemma 4 QAT Checkpoints: Q4_0 and a New Mobile Format Cut On-Device Memory

Google DeepMind Releases Gemma 4 QAT Checkpoints: Q4_0 and a New Mobile Format Cut On-Device Memory

Other newsrooms on this story

Related reading

Google Ships Gemma 4 QAT Checkpoints: Quantization-Aware Training

Gemma 4 QAT models: Optimizing model compression for mobile and laptop…

Gemma 4: A Practical Guide for Developers

Gemma 4 QAT on 10GB Laptop: Local AI with 6.7GB VRAM

Google’s Gemma 4 12B Shows AI Race Moving to Edge Devices

Google launches Gemma 4 AI model designed for laptops

Other newsrooms on this story

Related reading

Google Ships Gemma 4 QAT Checkpoints: Quantization-Aware Training

Gemma 4 QAT models: Optimizing model compression for mobile and laptop…

Gemma 4: A Practical Guide for Developers

Gemma 4 QAT on 10GB Laptop: Local AI with 6.7GB VRAM

Google’s Gemma 4 12B Shows AI Race Moving to Edge Devices

Google launches Gemma 4 AI model designed for laptops