A raw, developer-first look at Google’s new open-weight Gemma 4 family—featuring a hands-on local Python setup, a comparison of the 2B, 9B, and 31B variants, and the brutal math of the 128K context window VRAM consumption.

The Local AI Hype vs. The VRAM Reality

Every major AI release follows the same cycle. A marketing flash, a flurry of bench-marking charts showing a new model "beating" closed models, and a rush of developers trying to figure out how to actually run it locally without melting their graphics cards.

Google’s release of Gemma 4 is no exception.

As Google’s most capable open-weight model family yet, Gemma 4 is genuinely impressive. It introduces native multimodal vision support, a massive 128K context window, and advanced reasoning capabilities that rival closed proprietary models. Even better, Google provides model weights across a wide spectrum: from a lightweight 2B model that runs on phones and Raspberry Pis, up to a highly capable 31B model that competes directly with enterprise cloud models.