Google's new Gemma 4 open AI model is sized for your laptop

Gemma 4 12B is almost as capable as the version with 26 billion parameters.

Google says the new model is capable of complex multi-step reasoning and agentic workflows that previously required the larger Gemma variants. Despite the smaller parameter count, Gemma 4 12B comes with the newly devised Multi-Token Prediction (MTP) drafters, which take advantage of unused processing cycles to calculate possible future tokens. The result is greater speed and efficiency. Google has released optional MTP versions of the other Gemma 4 models, but this is the first one to have MTP out of the box.

Gemma 4 12B is also more efficient thanks to a new approach to multimodality. The Gemma 4 family is natively multimodal, accepting text, audio, or images as inputs. Most gen AI models—including the other Gemma 4 variants—use dedicated encoders to process non-text inputs and pass that data to the LLM. This works well enough, but it increases latency and memory usage.

With the new mid-weight model, Google has implemented a streamlined embedding module for vision, featuring single-matrix multiplication and positional embedding, which allows the data to pass to the LLM with proper spatial awareness. This eliminates the need for a bulky middleman encoder. For audio, there’s no encoding at all. The developers worked out a method of projecting the raw audio signal into the same vectors used for text tokens.

Gemma 4 12B is almost as capable as the version with 26 billion parameters.

Google's new Gemma 4 open AI model is sized for your laptop

Google's new Gemma 4 open AI model is sized for your laptop

Other newsrooms on this story

Related reading

Google's new Gemma 4 12B model is designed to run on any laptop with 16GB of RAM

Google launches Gemma 4 AI model designed for laptops

Gemma 4 12B: Google's encoder-free multimodal AI now runs on a laptop

Google Deepmind's Gemma 4 12B squeezes multimodal AI onto a laptop with just 16…

Google's new open source Gemma 4 12B analyzes audio, video — and runs entirely…

Introducing Gemma 4 12B: a unified, encoder-free multimodal model

Other newsrooms on this story

Related reading

Google's new Gemma 4 12B model is designed to run on any laptop with 16GB of RAM

Google launches Gemma 4 AI model designed for laptops

Gemma 4 12B: Google's encoder-free multimodal AI now runs on a laptop

Google Deepmind's Gemma 4 12B squeezes multimodal AI onto a laptop with just 16…

Google's new open source Gemma 4 12B analyzes audio, video — and runs entirely…

Introducing Gemma 4 12B: a unified, encoder-free multimodal model