Get started
AI Dubbing

Dub any content. Keep the performer's voice

Cross-lingual voice cloning trained to preserve timbre, pacing, and character across every language your release covers. One performance, 15 languages, every market, one voice.
Dubbing pipeline
Source
voice_clone 5-15s reference

EN master · Narrator A · 22 minutes

Output
voiceId narrator_a (pvc)

ES · FR · DE · JA, same performer, 4 languages in parallel

Powered by
TTS

Dubbing that doesn't re-cast the performer.

Already dubbing at real volume
Works with
TTS

Studios and consumer-media platforms dub on Inworld every day.

Consumer-media platforms run over a million dubbing minutes a month on Inworld across Spanish, French, German, and Portuguese.
Dubbing at studio and consumer scale
Consumer-media platforms
1M+ dubbing minutes a month · ES/FR/DE/PT
Studio partners
Custom tools for dubbing, translation, news
Developer platforms
Video dubbing integration
Ad-tech + localization
Personalized voiceover at scale
Cross-lingual voice cloning

Clone once. Dub into every language on your release schedule.

The performer's voice is part of the brand. Cross-lingual cloning preserves timbre and pacing across every supported language. No re-casting.
Cross-lingual cloning
Original
EN · Narrator A
Dubbed
ES / FR / DE / JA · same voice
One cloned performance, every target language, no re-casting.
15 languages out of the gate

Every major release market in a single API.

TTS 1.5 covers 15 languages, from English and Spanish to Japanese, Korean, Mandarin, Hindi, and Arabic. Cross-lingual cloning works across every pair.
TTS 1.5 · language coverage
15
Languages, one cloned voice
TTS 2.0 expands this further: steering, conversationality, and more languages on the roadmap.
Ship every market the same day

One script in, every target language out, in parallel.

You don't wait on Spanish to start French. The whole release dubs at once, and your launch moves as fast as the slowest language instead of the stack of them.
Scripted pipeline, parallel output
non-streaming endpoint
01
Source script
EN master
02
Clone once
5-15s reference audio
03
Dub in parallel
15 target languages
04
Deliver
MP3 · WAV · 48kHz
Dub that sounds like acting

The performance survives the translation.

Most dubs sound like someone reading the subtitles. The Spanish take keeps the same pause and half-laugh as the English original, because the performance transfers, not just the words.
One voice, every target language
EN
source
ES
dubbed
FR
dubbed
DE
dubbed
JA
dubbed
PT
dubbed
Same timbre, same pacing, same performer. One clone preserves the voice across every dub.
Unreleased scripts stay in the building

Dub on-prem, so pre-release IP never leaves your cluster.

Deploy in your own infrastructure and keep every frame behind your firewall. SOC 2 Type II, GDPR, zero retention by default.
Pre-release content stays in the studio
On-premise deployment
H100 / A100 / H200 / B200 / B300
Studio-specific custom models
Trained on your tonality
Zero data retention
Pre-release content never leaves
SOC 2 Type II · GDPR
Certified
Run the whole dub pipeline inside your own cluster. Unreleased scripts never leave the lot.

FAQ

15 languages in TTS 1.5: English, Spanish, French, German, Italian, Portuguese, Japanese, Korean, Mandarin, Hindi, Arabic, Polish, Dutch, Russian, Hebrew. TTS 2.0 expands this; see TTS 2.0 launch intel for the roadmap.
Instant cloning with 5-15 seconds of reference audio. Professional cloning with 30+ minutes for higher fidelity. Cross-lingual cloning preserves voice identity across every supported language, clone in English, dub in Spanish, French, German with the same voice.
Yes. Professional Voice Clones (PVC) are available for studio-scale workflows with licensing coordination, dedicated training, and higher-fidelity models. Contact sales for PVC scoping.
Cloning requires rights to the source audio, and rights clearance sits with you. The tooling and infrastructure are ours; the licensing is yours. Cloned voices belong to your account under the Inworld terms of service and can be used commercially within that scope.
Dubbing is typically asynchronous (non-streaming TTS endpoint). The Realtime API is not the dubbing path, it's the live avatar path. For live speech-to-speech translation use cases, the Realtime API plus a translation-layer LLM gives you near-realtime dubbed response.
Yes. On-premise deployment on H100, A100, H200, B200, B300 for pre-release content protection. Studio-specific custom training, zero data retention on TTS by default, SOC 2 Type II, GDPR. Pre-release IP never leaves your infrastructure.
TTS 2.0 adds natural-language steering, conversationality with disfluencies, singing, denoise, and more. That's where 'dub that sounds like acting' comes from, performance-level output rather than translated reading. TTS 2.0 is in active development; contact sales for early access.
MP3, WAV, PCM, LINEAR16, OGG_OPUS, μ-law, A-law, FLAC. Sample rates 8-48kHz. 48kHz recommended for broadcast and streaming dubbing pipelines.

Dub without re-casting.

Cross-lingual voice cloning across 15 languages. Studio-grade on-prem. Performance that sounds like acting.
Copyright © 2021-2026 Inworld AI
AI Dubbing API: Cross-Lingual Voice Cloning Across 15 Languages | Inworld AI