Get started
AI Dubbing

Dub any content. Keep the performer's voice

Cross-lingual voice cloning trained to preserve timbre, pacing, and character across every language your release covers. One performance, over 100 languages, every market, one voice.
Dubbing pipeline
Source
voice_clone 5-15s reference

EN master · Narrator A · 22 minutes

Output
voiceId narrator_a (pvc)

ES · FR · DE · JA, same performer, 4 languages in parallel

Powered by
TTS

Dubbing that doesn't re-cast the performer.

Already dubbing at real volume
Works with
TTS

Studios and consumer-media platforms dub on Inworld every day.

Consumer-media platforms run over a million dubbing minutes a month on Inworld across Spanish, French, German, and Portuguese.
Dubbing at studio and consumer scale
Consumer-media platforms
1M+ dubbing minutes a month · ES/FR/DE/PT
Studio partners
Custom tools for dubbing, translation, news
Developer platforms
Video dubbing integration
Ad-tech + localization
Personalized voiceover at scale
Cross-lingual voice cloning

Clone once. Dub into every language on your release schedule.

The performer's voice is part of the brand. Cross-lingual cloning preserves timbre and pacing across every supported language. No re-casting.
Cross-lingual cloning
Original
EN · Narrator A
Dubbed
ES / FR / DE / JA · same voice
One cloned performance, every target language, no re-casting.
Over 100 languages out of the gate

Every major release market in a single API.

Realtime TTS-2 covers over 100 languages, from English and Spanish to Japanese, Korean, Mandarin, Hindi, and Arabic. Cross-lingual cloning works across every pair, with one voice identity preserved end-to-end.
TTS 1.5 · language coverage
15
Languages, one cloned voice
Realtime TTS-2 expands this further: steering, conversationality, and more languages on the roadmap.
Ship every market the same day

One script in, every target language out, in parallel.

You don't wait on Spanish to start French. The whole release dubs at once, and your launch moves as fast as the slowest language instead of the stack of them.
Scripted pipeline, parallel output
non-streaming endpoint
01
Source script
EN master
02
Clone once
5-15s reference audio
03
Dub in parallel
15 target languages
04
Deliver
MP3 · WAV · 48kHz
Dub that sounds like acting

The performance survives the translation.

Most dubs sound like someone reading the subtitles. The Spanish take keeps the same pause and half-laugh as the English original, because the performance transfers, not just the words.
One voice, every target language
EN
source
ES
dubbed
FR
dubbed
DE
dubbed
JA
dubbed
PT
dubbed
Same timbre, same pacing, same performer. One clone preserves the voice across every dub.
Unreleased scripts stay in the building

Dub on-prem, so pre-release IP never leaves your cluster.

Deploy in your own infrastructure and keep every frame behind your firewall. SOC 2 Type II, GDPR, zero retention by default.
Pre-release content stays in the studio
On-premise deployment
H100 / A100 / H200 / B200 / B300
Studio-specific custom models
Trained on your tonality
Zero data retention
Pre-release content never leaves
SOC 2 Type II · GDPR
Certified
Run the whole dub pipeline inside your own cluster. Unreleased scripts never leave the lot.

FAQ

Over 100 languages with Realtime TTS-2 (research preview, launched April 29, 2026), with one voice identity preserved across every language. Realtime TTS 1.5 covers 15 languages: English, Spanish, French, German, Italian, Portuguese, Japanese, Korean, Mandarin, Hindi, Arabic, Polish, Dutch, Russian, Hebrew. See Realtime TTS-2 launch intel for the full list.
Instant cloning with 5-15 seconds of reference audio. Professional cloning with 30+ minutes for higher fidelity. Cross-lingual cloning preserves voice identity across every supported language, clone in English, dub in Spanish, French, German with the same voice.
Yes. Professional Voice Clones (PVC) are available for studio-scale workflows with licensing coordination, dedicated training, and higher-fidelity models. Contact sales for PVC scoping.
Cloning requires rights to the source audio, and rights clearance sits with you. The tooling and infrastructure are ours; the licensing is yours. Cloned voices belong to your account under the Inworld terms of service and can be used commercially within that scope.
Dubbing is typically asynchronous (non-streaming TTS endpoint). The Realtime API is not the dubbing path, it's the live avatar path. For live speech-to-speech translation use cases, the Realtime API plus a translation-layer LLM gives you near-realtime dubbed response.
Yes. On-premise deployment on H100, A100, H200, B200, B300 for pre-release content protection. Studio-specific custom training, zero data retention on TTS by default, SOC 2 Type II, GDPR. Pre-release IP never leaves your infrastructure.
Realtime TTS-2 adds natural-language voice direction, conversational awareness (the model hears prior-turn audio, not just transcripts), crosslingual identity preservation across over 100 languages, and Advanced Voice Design with three stability modes. That's where 'dub that sounds like acting' comes from — performance-level output rather than translated reading. Realtime TTS-2 is available now in research preview; contact sales for production access.
MP3, WAV, PCM, LINEAR16, OGG_OPUS, μ-law, A-law, FLAC. Sample rates 8-48kHz. 48kHz recommended for broadcast and streaming dubbing pipelines.

Dub without re-casting.

Cross-lingual voice cloning across over 100 languages. Studio-grade on-prem. Performance that sounds like acting.
Copyright © 2021-2026 Inworld AI
AI Dubbing API: Cross-Lingual Voice Cloning Across Over 100 Languages | Inworld AI