Adding Gemma 4 speech recognition to a .NET desktop app: the llama-server sidecar that survived

In April 2026 Google shipped Gemma 4, a multimodal model with a native audio path. I wanted to add it to Parlotype, my .NET 10 dictation app, as a second speech engine alongside Whisper. Four runtime paths got cut before I landed on llama.cpp's llama-server as a child process. This post walks through the cuts, the architecture that survived, the variant catalog, and the benchmarks.

Parlotype is a voice-to-text desktop app for Windows with on-device speech recognition as the default. You hold a global hotkey, speak, release. Text appears in whatever app you were typing into. This post is about adding a second on-device engine. Cloud speech providers are a separate, opt-in track and not the subject here.

This is the long companion to my Gemma 4 Challenge submission on the same topic. The challenge post is the 5-variant tour with the shipping decision. This one is the runtime selection and the architecture under it.

The constraints

Worth naming the constraints up front so the obvious answers make sense as dead-ends:

The constraints

Worth naming the constraints up front so the obvious answers make sense as dead-ends:

Adding Gemma 4 speech recognition to a .NET desktop app: the llama-server sidecar that survived

Adding Gemma 4 speech recognition to a .NET desktop app: the llama-server sidecar that survived

Related reading

Shipping Gemma 4 speech recognition in a Windows .NET desktop app: a 5-variant…

Introducing Gemma 4 12B: a unified, encoder-free multimodal model

Gemma 4 12B: The Developer Guide- Google Developers Blog

Welcome Gemma 4: Frontier multimodal intelligence on device

Gemma 4 12B: Google's encoder-free multimodal AI now runs on a laptop

Your Laptop Just Got Smarter: A Complete Guide to Gemma 4's Four Models

Related reading

Shipping Gemma 4 speech recognition in a Windows .NET desktop app: a 5-variant…

Introducing Gemma 4 12B: a unified, encoder-free multimodal model

Gemma 4 12B: The Developer Guide- Google Developers Blog

Welcome Gemma 4: Frontier multimodal intelligence on device

Gemma 4 12B: Google's encoder-free multimodal AI now runs on a laptop

Your Laptop Just Got Smarter: A Complete Guide to Gemma 4's Four Models