Originally posted (in Spanish) on my blog: pereyra.ar/blog/clau-tg

TL;DR

For months I'd been building scattered labs in my homelab: a multimodal RAG, a WhatsApp router, a GPU switchboard, Claude Code running on K3s, a realtime lip-sync avatar. Each one solved a single thing and lived on its own. This week I plugged them all behind a single voice agent — clau — and the way I talk to it is literal: a Telegram phone call.

The interesting part wasn't the voice agent itself (there are a thousand of those). It was that, once all the labs were connected, clau stopped being "a voice bot" and turned into the primary interface for operating my infra. I talk to it, it sees my screen, it shows me its own, it runs kubectl against my cluster, it browses the web, and when something is out of its reach it delegates to another agent with more permissions and comes back with the answer, spoken.

This post covers the architecture, the "voice agent as an orchestration layer" pattern, and the honest tradeoffs of building this on-prem.