Building Zero-Shared-State Auth Middleware and Real-Time Whisper STT Pipeline for Voice AI

I recently built a production-grade real-time Voice AI workspace from scratch. While the whole system has many moving parts, two components required the most careful engineering: the authentication middleware between services and the Speech-to-Text (STT) pipeline.

Here’s exactly how I approached and solved both.

The Middleware Problem

I needed two local microservices — a WebRTC audio server and a FastMCP server — to communicate securely.

I didn’t want to introduce a database, Redis, or any hardcoded secrets. The solution had to be lightweight, stateless, and still reasonably secure for internal communication.

Building Zero-Shared-State Auth Middleware and Real-Time Whisper STT Pipeline for Voice AI

Other newsrooms on this story

Related reading

Building voice + AI agents from a backend background and how AI got me there

Building a Real-Time AI Voice Agent with OpenAI Realtime API and Next.js

The "Zero-Latency" Deep Dive: Architecting Concurrent Voice AI in Python

🎤 Building a Real-Time Voice AI Assistant Using Open Source Tools

Announcing the fastest inference for realtime voice AI agents

Building Production Voice AI Agents: Latency, Architecture, and What Nobody…