What if your home could understand you — without sending a single word to the cloud?

That question started this project. I wanted to control my smart home with voice commands in Hungarian — a language that sits far outside the English-centric comfort zone of most voice assistants. I wanted context awareness: the system should know which lights are already on, what time of day it is. And I wanted it to be private: no audio recordings uploaded to someone else's servers, no device state telemetry leaving my network.

What I did not expect was that the journey from cloud to local AI would end with my local setup outperforming the cloud version. This is the full story — with the raw numbers to prove it.

The Problem and Motivation

The cloud version worked. Groq's Whisper API transcribed Hungarian speech reliably, OpenAI's GPT interpreted the commands, and my lights responded in about four seconds. But four seconds is actually the good news. The bad news is in the variance: the same system took anywhere from 2.7 to 9.2 seconds depending on cloud load and network conditions. On a bad day, it felt slow. On a very bad day — like the one data point at 9.2 seconds — it felt broken.