Question 1

Which languages can I dub into?

Accepted Answer

Over 100 languages with Realtime TTS-2 (research preview, launched April 29, 2026), with one voice identity preserved across every language. Realtime TTS 1.5 covers 15 languages: English, Spanish, French, German, Italian, Portuguese, Japanese, Korean, Mandarin, Hindi, Arabic, Polish, Dutch, Russian, Hebrew. See Realtime TTS-2 launch intel for the full list.

Question 2

How does cross-lingual cloning work?

Accepted Answer

Instant cloning with 5-15 seconds of reference audio. Professional cloning with 30+ minutes for higher fidelity. Cross-lingual cloning preserves voice identity across every supported language, clone in English, dub in Spanish, French, German with the same voice.

Question 3

Do you support professional voice clones?

Accepted Answer

Yes. Professional Voice Clones (PVC) are available for studio-scale workflows with licensing coordination, dedicated training, and higher-fidelity models. Contact sales for PVC scoping.

Question 4

Who handles voice rights?

Accepted Answer

Cloning requires rights to the source audio, and rights clearance sits with you. The tooling and infrastructure are ours; the licensing is yours. Cloned voices belong to your account under the Inworld terms of service and can be used commercially within that scope.

Question 5

Is dubbing realtime?

Accepted Answer

Dubbing is typically asynchronous (non-streaming TTS endpoint). The Realtime API is not the dubbing path, it's the live avatar path. For live speech-to-speech translation use cases, the Realtime API plus a translation-layer LLM gives you near-realtime dubbed response.

Question 6

Can we deploy on-premise?

Accepted Answer

Yes. On-premise deployment on H100, A100, H200, B200, B300 for pre-release content protection. Studio-specific custom training, zero data retention on TTS by default, SOC 2 Type II, GDPR. Pre-release IP never leaves your infrastructure.

Question 7

What does Realtime TTS-2 add for dubbing?

Accepted Answer

Realtime TTS-2 adds natural-language voice direction, conversational awareness (the model hears prior-turn audio, not just transcripts), crosslingual identity preservation across over 100 languages, and Advanced Voice Design with three stability modes. That's where 'dub that sounds like acting' comes from — performance-level output rather than translated reading. Realtime TTS-2 is available now in research preview; contact sales for production access.

Question 8

What output formats?

Accepted Answer

MP3, WAV, PCM, LINEAR16, OGG_OPUS, μ-law, A-law, FLAC. Sample rates 8-48kHz. 48kHz recommended for broadcast and streaming dubbing pipelines.

Dub any content. Keep the performer's voice

Dubbing that doesn't re-cast the performer.

Studios and consumer-media platforms dub on Inworld every day.

Studios and consumer-media platforms dub on Inworld every day.

Clone once. Dub into every language on your release schedule.

Clone once. Dub into every language on your release schedule.

Every major release market in a single API.

Every major release market in a single API.

One script in, every target language out, in parallel.

One script in, every target language out, in parallel.

The performance survives the translation.

The performance survives the translation.

Dub on-prem, so pre-release IP never leaves your cluster.

Dub on-prem, so pre-release IP never leaves your cluster.

FAQ

Dub without re-casting.