What did gemma see? - Thinking in comments...

This is a submission for the Gemma 4 Challenge: Write About Gemma 4 While running a simple harness...

mercoledì 20 maggio 2026 New tab

3,215 words~15 min read

This is a submission for the Gemma 4 Challenge: Write About Gemma 4

While running a simple harness around the HumanEval benchmark problems as test of local models, I was surprised to see gemma4:26b to be the first local model to pass the controversial HumanEval/145 question.

Not only had gemma4:26b solved it, it was also the only model to score 164/164, a perfect run.

I hadn't seen a single pass on HumanEval/145 in any of the ~50 runs with other models from the Gemma, Qwen, Deepseek, Mistral, Granite, LLaMA, OLMo, Nemotron,... families. Why?

HumanEval Leaderboard

What did gemma see? - Thinking in comments...

What did gemma see? - Thinking in comments...

Related reading

I Gave Gemma 4 150 Tools on Windows. Here's What Actually Happened.

I Fine-Tuned Gemma 4 on an Emotion Dataset Using a Single GPU

I was trying to Learning About Gemma 4 and It was pretty good

Gemma 4 Is Not Just Another Open Model — It Changes What Developers Can Build…

Running Gemma 4 on a Modest Machine: Unsloth vs LM Studio vs llama.cpp vs Ollama

Gemma 4 on 16GB RAM: What Actually Works for Structured AI Workflows

Related reading

I Gave Gemma 4 150 Tools on Windows. Here's What Actually Happened.

I Fine-Tuned Gemma 4 on an Emotion Dataset Using a Single GPU

I was trying to Learning About Gemma 4 and It was pretty good

Gemma 4 Is Not Just Another Open Model — It Changes What Developers Can Build…

Running Gemma 4 on a Modest Machine: Unsloth vs LM Studio vs llama.cpp vs Ollama

Gemma 4 on 16GB RAM: What Actually Works for Structured AI Workflows