Run Gemma-4 E2B-it with llama.cpp on Raspberry Pi4

Tested Gemma-4 E2B-it on Raspberry Pi 4. the way to convert Gemma-4 E2B-it to gguf Quantizing...

domenica 31 maggio 2026 New tab

TL;DRAI

Gemma-4 E2B-it (Q4_K_M) tested on Raspberry Pi 4 with llama.cpp reaches ~1.8 t/s generation — too slow for agentic workloads; LFM2.5-8B drops to just 0.5 t/s. At ~$305, a Pi 5 loses on price/performance vs. a $300–400 mini PC with 16GB RAM for sub-10B local inference.

774 words~4 min read

Tested Gemma-4 E2B-it on Raspberry Pi 4.

the way to convert Gemma-4 E2B-it to gguf

models

https://huggingface.co/baxin/gemma-4-E4B-it-E2B-it-Q4_K_M

llama.cpp

Other newsrooms on this story

· 1 sources

Full timeline →

blog.google·Jun 5, 2026 · 1 mesi fa
Gemma 4 QAT models: Optimizing model compression for mobile and laptop efficiency

Run Gemma-4 E2B-it with llama.cpp on Raspberry Pi4

Other newsrooms on this story

Run Gemma-4 E2B-it with llama.cpp on Raspberry Pi4

Other newsrooms on this story

Related reading

Running Gemma 4 Locally with Ollama and OpenCode

Quantizing Gemma 4 on Mac with llama.cpp

Porting Gemma-4 (2B / 4B / 12B) to AWS Inferentia2

Running Gemma 4 on a Modest Machine: Unsloth vs LM Studio vs llama.cpp vs Ollama

I Ran Every Gemma 4 Model on My Home Lab. E4B Crushes E2B. Here's the Data.

Breathing Life into the Pi: Deploying Gemma 4 2B on a Raspberry Pi 5

Related reading

Running Gemma 4 Locally with Ollama and OpenCode

Quantizing Gemma 4 on Mac with llama.cpp

Porting Gemma-4 (2B / 4B / 12B) to AWS Inferentia2

Running Gemma 4 on a Modest Machine: Unsloth vs LM Studio vs llama.cpp vs Ollama

I Ran Every Gemma 4 Model on My Home Lab. E4B Crushes E2B. Here's the Data.

Breathing Life into the Pi: Deploying Gemma 4 2B on a Raspberry Pi 5