Meet OmniVoice Studio: A Local, Open-Source Alternative to ElevenLabs

ElevenLabs charges between $5 and $330 per month for voice AI services. Every audio file you process goes through their cloud servers. For those looking for an open source alternative of ElevenLabs, OmniVoice Studio is good fit as an open-source desktop application that runs the same categories of tasks locally. It is a very interesting individual project that handles voice cloning, video dubbing, real-time dictation, vocal isolation, and speaker diarization — without sending data to an external server.

The application bundles six distinct capabilities. Understanding each one helps clarify what the system is doing under the hood.

Voice cloning works from a 3-second audio clip. The system uses zero-shot learning, meaning it clones a voice it has never been trained on before. It does this by conditioning a diffusion-based TTS model on the short reference audio. The underlying model, OmniVoice from k2-fsa, supports 600+ languages.

Voice design lets you build a new voice from parameters: gender, age, accent, pitch, speed, emotion, and dialect — without cloning any existing voice.

Video dubbing takes a YouTube URL or a local video file. It runs transcription using WhisperX, translates the transcript, synthesizes new audio using the TTS engine, and exports an MP4. The entire pipeline runs locally.

The application bundles six distinct capabilities. Understanding each one helps clarify what the system is doing under the hood.

Voice design lets you build a new voice from parameters: gender, age, accent, pitch, speed, emotion, and dialect — without cloning any existing voice.

Meet OmniVoice Studio: A Local, Open-Source Alternative to ElevenLabs

Meet OmniVoice Studio: A Local, Open-Source Alternative to ElevenLabs

Other newsrooms on this story

Related reading

Voicebox: The Open-Source AI Voice Studio That Just Hit 28K Stars

Voice cloning models, measured across five languages

Introducing Scribe v2

Announcing the fastest inference for realtime voice AI agents

ElevenLabs Blog - Company, Research & Product Updates

New open-source voice model listens nonstop and decides every 0.4 seconds…

Related reading

Voicebox: The Open-Source AI Voice Studio That Just Hit 28K Stars

Voice cloning models, measured across five languages

Introducing Scribe v2

Announcing the fastest inference for realtime voice AI agents

ElevenLabs Blog - Company, Research & Product Updates

New open-source voice model listens nonstop and decides every 0.4 seconds…

Other newsrooms on this story