Cloud TTS Chirp3-HD with Caching: Fixing Voice Readout for Accessibility
As a solo developer, keeping the product lean and accessible is paramount. A recent request highlighted a critical need: the ability to have AI chat responses read aloud. This wasn't just about adding a feature; it was about making the platform usable for someone with visual impairments, specifically a user's mother-in-law who struggles with reading text on screen. The initial thought was simple text-to-speech (TTS), but the reality of implementing it well, especially on a single small VM, presented several engineering challenges.
The Genesis: A Need for Voice
The request was clear: "When I ask a question via text, if I don't have time to check it, let me hear the answer via voice." This immediately told me it wasn't about real-time conversational voice, but rather a playback feature for existing text responses. This distinction is crucial for architecture and cost management.
Design Iterations: From Browser to Cloud






