Google DeepMind Releases Gemma 4 12B: An Encoder-Free Multimodal Model with Native audio that runs on a 16 GB laptop

Google DeepMind just released Gemma 4 12B, a dense multimodal model that strips out traditional encoders entirely. Vision and audio flow straight into the LLM backbone. The result is a model that runs agentic workflows on a consumer laptop with 16 GB of RAM. It ships under the Apache 2.0 license.

Model Overview & Access

Gemma 4 12B is a 12-billion-parameter decoder-only transformer. It handles text, images, audio, and video natively. There are no separate vision or audio encoders. The decoder uses the same structure as the Gemma 4 31B Dense model. It bridges the gap between the edge-friendly E4B and the larger 26B Mixture of Experts variant.

Architecture: Unified, encoder-free decoder-only transformer.

Modalities: Text, image, video, and native audio input — the first mid-sized Gemma with audio.

Model Overview & Access

Architecture: Unified, encoder-free decoder-only transformer.

Modalities: Text, image, video, and native audio input — the first mid-sized Gemma with audio.

Google DeepMind Releases Gemma 4 12B: An Encoder-Free Multimodal Model with Native audio that runs on a 16 GB laptop

Google DeepMind Releases Gemma 4 12B: An Encoder-Free Multimodal Model with Native audio that runs on a 16 GB laptop

Other newsrooms on this story

Related reading

Gemma 4 12B: Google's encoder-free multimodal AI now runs on a laptop

Google Deepmind's Gemma 4 12B squeezes multimodal AI onto a laptop with just 16…

Gemma 4 12B: The Developer Guide- Google Developers Blog

Introducing Gemma 4 12B: a unified, encoder-free multimodal model

Google's new Gemma 4 open AI model is sized for your laptop

Google's new open source Gemma 4 12B analyzes audio, video — and runs entirely…

Related reading

Gemma 4 12B: Google's encoder-free multimodal AI now runs on a laptop

Google Deepmind's Gemma 4 12B squeezes multimodal AI onto a laptop with just 16…

Gemma 4 12B: The Developer Guide- Google Developers Blog

Introducing Gemma 4 12B: a unified, encoder-free multimodal model

Google's new Gemma 4 open AI model is sized for your laptop

Google's new open source Gemma 4 12B analyzes audio, video — and runs entirely…

Other newsrooms on this story