The decision you don't realize you're making

You sit down to wire Gemma 4 into a local agent loop — a Claude-Code-style tool-using harness, a long-context code reviewer, an offline research assistant. Google has handed you four architectures from the same release. The contest framing nudges you toward an obvious read:

E2B and E4B (effective-parameter) models for phones, browsers, and a Pi 5.

31B dense as the on-prem workhorse.

26B MoE as the efficient one — the mixture-of-experts variant that activates a fraction of its parameters per token and claims better quality per FLOP.