Voice Design

Create a custom voice from a text description

Q: What is voice design and how is it different from voice cloning?

Voice cloning needs an audio sample of a real voice you already have. Voice design needs only a written description. Cloning reproduces a specific voice; design invents a new one to a spec. Use voice cloning when you have reference audio, and voice design when you want a voice that does not exist yet.

Q: How do I write a good voice design prompt?

Describe the voice across five dimensions: gender and age, accent or dialect, pitch and pace, tone or emotion, and timbre. Inworld Voice Design prompts accept 30 to 250 characters of English description. Concrete adjectives beat generic ones. "A warm, friendly woman in her late twenties with a soft British accent and calm pacing" produces a more predictable result than "a nice voice".

Q: How many voice previews do I get per prompt?

Each design request returns up to three previews. You set numberOfSamples between 1 and 3. Listen to each preview, then publish the one you want to keep as a permanent voice.

Q: Can I save a designed voice as a permanent voice ID?

Yes. Previews from the design endpoint are disposable. To keep a voice, call the publish endpoint to convert a preview into a permanent voiceId. The published voice is then available in your workspace and works across every Inworld product the same way a library voice does.

Q: Can I use a designed voice in a realtime voice agent?

Yes. A published voice is a standard voiceId. You can pass it into the Realtime API session configuration as the output voice, and it will run at the same latency and quality as any built-in voice. Voice design is built for realtime voice agents, not only pre-rendered audio.

Q: How many languages does Inworld Voice Design support?

Voice design prompts themselves are English only, but the generated voice can speak across the 14 language codes supported by the design API, including English, Spanish, French, German, Italian, Portuguese, Polish, Dutch, Russian, Arabic, Japanese, Korean, Chinese, and Auto. For the full list and any updates, see the voice design documentation.

Q: Can I use a designed voice commercially?

Voices you design and publish are yours to use in production under the Inworld terms of service, including for commercial applications. Voice design is currently in research preview, so review the latest terms on the Inworld pricing page and documentation for any preview-specific guidance.

Q: Is voice design generally available?

Voice design is in research preview. The API, schema, and supported parameters may change as the feature matures. It is stable enough to build with today, and we recommend pinning to a published voiceId rather than regenerating from the same prompt if you need strict consistency.

Inworld AI Voice Design turns a written description into a production-ready voice. Describe the age, accent, timbre, and pacing you want, get up to three previews, then deploy the voice in TTS, the Realtime API, or any voice agent you build on Inworld.

Design a voice Read the docs

Voice design request

Prompt

langCode EN_USnumberOfSamples 3prompt_chars 138 / 250

A warm, friendly woman in her late twenties with a soft British accent and calm pacing, suitable for a guided meditation app.

Preview

voiceId workspace__design-voice-a2fduration_secs 4.8status ready_to_publish

Hello, this is a sample of my voice. Take a slow breath with me.

Works with

TTS Realtime API Router

A studio in a sentence.

Six reasons teams are designing production voices from a prompt instead of a recording booth.

Prompt, don't record

Works with

TTS

Realtime API

The voice in your head, rendered.

Forget casting calls, studio sessions, and hunting through voice libraries. Write the voice you want in plain English and hear it back in seconds. If you can describe it, you can ship it.

Try a prompt

Prompt

138 / 250 chars

A warm, friendly woman in her late twenties with a soft British accent and calm pacing, suitable for a guided meditation app.

EN_US · 3 previews

Generate

Prompt, don't record

Works with

TTS

Realtime API

The voice in your head, rendered.

Forget casting calls, studio sessions, and hunting through voice libraries. Write the voice you want in plain English and hear it back in seconds. If you can describe it, you can ship it.

Try a prompt

Prompt

138 / 250 chars

A warm, friendly woman in her late twenties with a soft British accent and calm pacing, suitable for a guided meditation app.

EN_US · 3 previews

Generate

Three takes, one pick

Works with

TTS

Three versions of your voice. Keep the one that nails it.

Every prompt returns up to three distinct voice previews. Listen, compare, and publish the one that fits your product. No wasted generations, no commitment until you say so.

Three previews

Pick one

preview_01

workspace__design-voice-a2f

4.8s

preview_02

workspace__design-voice-b91

5.2s

preview_03

workspace__design-voice-c04

4.6s

Three previews

Pick one

preview_01

workspace__design-voice-a2f

4.8s

preview_02

workspace__design-voice-b91

5.2s

preview_03

workspace__design-voice-c04

4.6s

Three takes, one pick

Works with

TTS

Three versions of your voice. Keep the one that nails it.

Every prompt returns up to three distinct voice previews. Listen, compare, and publish the one that fits your product. No wasted generations, no commitment until you say so.

No latency cost

Works with

Realtime API

TTS

Your designed voice starts speaking in under a second.

A custom voice carries no latency penalty. It streams on the same path as every library voice, with the same sub-200ms median first-chunk time. Design without compromising performance.

Explore the Realtime API

Voice Design · streaming latency

<200ms

Median first audio chunk

Designed voices stream identically to library voices

No latency cost

Works with

Realtime API

TTS

Your designed voice starts speaking in under a second.

A custom voice carries no latency penalty. It streams on the same path as every library voice, with the same sub-200ms median first-chunk time. Design without compromising performance.

Explore the Realtime API

Voice Design · streaming latency

<200ms

Median first audio chunk

Designed voices stream identically to library voices

Sounds like a person

Works with

TTS

Design a voice that doesn't sound designed.

The output isn't a TTS voice with a different label. It's a voice built to your spec — the right age, weight, pace, and texture. If the first take is close but not quite right, adjust the description and regenerate. No re-booking, no new session.

Refine until it's right

v1 → v2

First prompt

"A calm female narrator."

too generic — iterate

Refined prompt

"A warm, deliberate woman in her early forties. Measured pacing, clear diction, slight Welsh lilt."

No re-booking. No new session. Tweak the prompt and generate again.

Refine until it's right

v1 → v2

First prompt

"A calm female narrator."

too generic — iterate

Refined prompt

"A warm, deliberate woman in her early forties. Measured pacing, clear diction, slight Welsh lilt."

No re-booking. No new session. Tweak the prompt and generate again.

Sounds like a person

Works with

TTS

Design a voice that doesn't sound designed.

Publish, reuse, version

Works with

TTS

Realtime API

Router

Two calls from idea to permanent voice ID.

Design, pick, publish. The voiceId you get back is yours forever. Drop it into any Inworld product, share it with your team, pin it for production, and call it like any other voice in the library.

Read the docs

publish.ts

// 1. Designconst result = await tts.designVoice({  designPrompt: "Calm British narrator",  numberOfSamples: 3,});// 2. Publish to a permanent voiceIdconst voice = await tts.publishVoice({  voice: result.previewVoices[0].voiceId,  displayName: "Meditation Guide",});

Publish, reuse, version

Works with

TTS

Realtime API

Router

Two calls from idea to permanent voice ID.

Design, pick, publish. The voiceId you get back is yours forever. Drop it into any Inworld product, share it with your team, pin it for production, and call it like any other voice in the library.

Read the docs

publish.ts

// 1. Designconst result = await tts.designVoice({  designPrompt: "Calm British narrator",  numberOfSamples: 3,});// 2. Publish to a permanent voiceIdconst voice = await tts.publishVoice({  voice: result.previewVoices[0].voiceId,  displayName: "Meditation Guide",});

#1 realtime TTS

Works with

TTS

Designed on the voice model that real users rank #1.

Voice Design is built on Realtime TTS 1.5 Max, which sits at the top of the Artificial Analysis leaderboard. Three of the top five voice models are ours. Your designed voice inherits every bit of that quality.

Artificial Analysis · TTS leaderboard

1208

1176

1165

1160

3 of top 5 are Inworld

Designed voices inherit the same quality

Artificial Analysis · TTS leaderboard

1208

1176

1165

1160

3 of top 5 are Inworld

Designed voices inherit the same quality

#1 realtime TTS

Works with

TTS

Designed on the voice model that real users rank #1.

Design and publish a voice in two calls

Voice design is a two-step flow. First, design generates up to three previews. Then, publish saves the preview you chose as a permanent voice you can use anywhere.

import { InworldTTS } from '@inworld/tts';

const tts = InworldTTS(); // reads INWORLD_API_KEY

// 1. Design: generate up to three voice previews from a description
const result = await tts.designVoice({
  designPrompt:
    'A warm, friendly woman in her late twenties with a soft British accent and calm pacing.',
  previewText: 'Hello, this is a sample of my voice.',
  numberOfSamples: 3,
  langCode: 'EN_US',
});

// 2. Preview: listen to each generated voice (base64 WAV)
for (const preview of result.previewVoices) {
  console.log(preview.voiceId, preview.previewAudio.slice(0, 32));
}

// 3. Publish: save the one you like as a permanent voice
const voice = await tts.publishVoice({
  voice: result.previewVoices[0].voiceId,
  displayName: 'Meditation Guide - British EN',
  description: 'Calm, guided-meditation narrator',
  tags: ['meditation', 'british', 'female'],
});

// 4. Use it anywhere Realtime TTS, Realtime API, or voice agents run
const audio = await tts.synthesize({
  voiceId: voice.voiceId,
  modelId: 'inworld-tts-2',
  text: 'Welcome back. Let us begin with a slow breath.',
});

import { InworldTTS } from '@inworld/tts';

const tts = InworldTTS(); // reads INWORLD_API_KEY

// 1. Design: generate up to three voice previews from a description
const result = await tts.designVoice({
  designPrompt:
    'A warm, friendly woman in her late twenties with a soft British accent and calm pacing.',
  previewText: 'Hello, this is a sample of my voice.',
  numberOfSamples: 3,
  langCode: 'EN_US',
});

// 2. Preview: listen to each generated voice (base64 WAV)
for (const preview of result.previewVoices) {
  console.log(preview.voiceId, preview.previewAudio.slice(0, 32));
}

// 3. Publish: save the one you like as a permanent voice
const voice = await tts.publishVoice({
  voice: result.previewVoices[0].voiceId,
  displayName: 'Meditation Guide - British EN',
  description: 'Calm, guided-meditation narrator',
  tags: ['meditation', 'british', 'female'],
});

// 4. Use it anywhere Realtime TTS, Realtime API, or voice agents run
const audio = await tts.synthesize({
  voiceId: voice.voiceId,
  modelId: 'inworld-tts-2',
  text: 'Welcome back. Let us begin with a slow breath.',
});

Prefer clicking? Design voices in the portal.

Open the TTS Playground, switch to the Design tab, type a description, and listen to previews side by side. Hit Improve Description to auto-expand short prompts, then save the voice to your workspace with one click. Everything the API does, plus a UI that lets your team design voices without writing code.

Open the playground

FAQ