ElevenLabs charges between $5 and $330 per month for voice AI services. Every audio file you process goes through their cloud servers. For those looking for an open source alternative of ElevenLabs, OmniVoice Studio is good fit as an open-source desktop application that runs the same categories of tasks locally. It is a very interesting individual project that handles voice cloning, video dubbing, real-time dictation, vocal isolation, and speaker diarization — without sending data to an external server.
The application bundles six distinct capabilities. Understanding each one helps clarify what the system is doing under the hood.
Voice cloning works from a 3-second audio clip. The system uses zero-shot learning, meaning it clones a voice it has never been trained on before. It does this by conditioning a diffusion-based TTS model on the short reference audio. The underlying model, OmniVoice from k2-fsa, supports 600+ languages.
Voice design lets you build a new voice from parameters: gender, age, accent, pitch, speed, emotion, and dialect — without cloning any existing voice.
Video dubbing takes a YouTube URL or a local video file. It runs transcription using WhisperX, translates the transcript, synthesizes new audio using the TTS engine, and exports an MP4. The entire pipeline runs locally.











