Get started
Voice Design

Create a custom voice from a text description

Inworld AI Voice Design turns a written description into a production-ready voice. Describe the age, accent, timbre, and pacing you want, get up to three previews, then deploy the voice in TTS, the Realtime API, or any voice agent you build on Inworld.
Voice design request
Prompt
langCode EN_USnumberOfSamples 3prompt_chars 138 / 250

A warm, friendly woman in her late twenties with a soft British accent and calm pacing, suitable for a guided meditation app.

Preview
voiceId workspace__design-voice-a2fduration_secs 4.8status ready_to_publish

Hello, this is a sample of my voice. Take a slow breath with me.

A studio in a sentence.

Six reasons teams are designing production voices from a prompt instead of a recording booth.
Prompt, don't record
Works with
TTSRealtime API

The voice in your head, rendered.

Forget casting calls, studio sessions, and hunting through voice libraries. Write the voice you want in plain English and hear it back in seconds. If you can describe it, you can ship it.
Try a prompt
Prompt
138 / 250 chars
A warm, friendly woman in her late twenties with a soft British accent and calm pacing, suitable for a guided meditation app.
EN_US · 3 previews
Generate
Three takes, one pick
Works with
TTS

Three versions of your voice. Keep the one that nails it.

Every prompt returns up to three distinct voice previews. Listen, compare, and publish the one that fits your product. No wasted generations, no commitment until you say so.
Three previews
Pick one
preview_01
workspace__design-voice-a2f
4.8s
preview_02
workspace__design-voice-b91
5.2s
preview_03
workspace__design-voice-c04
4.6s
No latency cost
Works with
Realtime APITTS

Your designed voice starts speaking in under a second.

A custom voice carries no latency penalty. It streams on the same path as every library voice, with the same sub-200ms median first-chunk time. Design without compromising performance.
Explore the Realtime API
Voice Design · streaming latency
<200ms
Median first audio chunk
Designed voices stream identically to library voices
Sounds like a person
Works with
TTS

Design a voice that doesn't sound designed.

The output isn't a TTS voice with a different label. It's a voice built to your spec — the right age, weight, pace, and texture. If the first take is close but not quite right, adjust the description and regenerate. No re-booking, no new session.
Refine until it's right
v1 → v2
First prompt
"A calm female narrator."
too generic — iterate
Refined prompt
"A warm, deliberate woman in her early forties. Measured pacing, clear diction, slight Welsh lilt."
No re-booking. No new session. Tweak the prompt and generate again.
Publish, reuse, version

Two calls from idea to permanent voice ID.

Design, pick, publish. The voiceId you get back is yours forever. Drop it into any Inworld product, share it with your team, pin it for production, and call it like any other voice in the library.
Read the docs
publish.ts
// 1. Designconst result = await tts.designVoice({  designPrompt: "Calm British narrator",  numberOfSamples: 3,});// 2. Publish to a permanent voiceIdconst voice = await tts.publishVoice({  voice: result.previewVoices[0].voiceId,  displayName: "Meditation Guide",});
#1 ranked quality
Works with
TTS

Designed on the voice model that real users rank #1.

Voice Design is built on Inworld TTS 1.5 Max, which sits at the top of the Artificial Analysis leaderboard. Three of the top five voice models are ours. Your designed voice inherits every bit of that quality.
Artificial Analysis · TTS leaderboard
#1
1238
#2
1179
#3
1168
#5
1162
3 of top 5 are Inworld
Designed voices inherit the same quality

Design and publish a voice in two calls

Voice design is a two-step flow. First, design generates up to three previews. Then, publish saves the preview you chose as a permanent voice you can use anywhere.
import { InworldTTS } from '@inworld/tts'; const tts = InworldTTS(); // reads INWORLD_API_KEY // 1. Design: generate up to three voice previews from a description const result = await tts.designVoice({ designPrompt: 'A warm, friendly woman in her late twenties with a soft British accent and calm pacing.', previewText: 'Hello, this is a sample of my voice.', numberOfSamples: 3, langCode: 'EN_US', }); // 2. Preview: listen to each generated voice (base64 WAV) for (const preview of result.previewVoices) { console.log(preview.voiceId, preview.previewAudio.slice(0, 32)); } // 3. Publish: save the one you like as a permanent voice const voice = await tts.publishVoice({ voice: result.previewVoices[0].voiceId, displayName: 'Meditation Guide - British EN', description: 'Calm, guided-meditation narrator', tags: ['meditation', 'british', 'female'], }); // 4. Use it anywhere Inworld TTS, Realtime API, or voice agents run const audio = await tts.synthesize({ voiceId: voice.voiceId, modelId: 'inworld-tts-1.5-max', text: 'Welcome back. Let us begin with a slow breath.', });

Prefer clicking? Design voices in the portal.

Open the TTS Playground, switch to the Design tab, type a description, and listen to previews side by side. Hit Improve Description to auto-expand short prompts, then save the voice to your workspace with one click. Everything the API does, plus a UI that lets your team design voices without writing code.
Open the playground

FAQ

Voice cloning needs an audio sample of a real voice you already have. Voice design needs only a written description. Cloning reproduces a specific voice; design invents a new one to a spec. Use voice cloning when you have reference audio, and voice design when you want a voice that does not exist yet.
Describe the voice across five dimensions: gender and age, accent or dialect, pitch and pace, tone or emotion, and timbre. Inworld Voice Design prompts accept 30 to 250 characters of English description. Concrete adjectives beat generic ones. "A warm, friendly woman in her late twenties with a soft British accent and calm pacing" produces a more predictable result than "a nice voice".
Each design request returns up to three previews. You set numberOfSamples between 1 and 3. Listen to each preview, then publish the one you want to keep as a permanent voice.
Yes. Previews from the design endpoint are disposable. To keep a voice, call the publish endpoint to convert a preview into a permanent voiceId. The published voice is then available in your workspace and works across every Inworld product the same way a library voice does.
Yes. A published voice is a standard voiceId. You can pass it into the Realtime API session configuration as the output voice, and it will run at the same latency and quality as any built-in voice. Voice design is built for realtime voice agents, not only pre-rendered audio.
Voice design prompts themselves are English only, but the generated voice can speak across the 14 language codes supported by the design API, including English, Spanish, French, German, Italian, Portuguese, Polish, Dutch, Russian, Arabic, Japanese, Korean, Chinese, and Auto. For the full list and any updates, see the voice design documentation.
Voices you design and publish are yours to use in production under the Inworld terms of service, including for commercial applications. Voice design is currently in research preview, so review the latest terms on the Inworld pricing page and documentation for any preview-specific guidance.
Voice design is in research preview. The API, schema, and supported parameters may change as the feature matures. It is stable enough to build with today, and we recommend pinning to a published voiceId rather than regenerating from the same prompt if you need strict consistency.

Ship a custom voice this afternoon

Sign up, paste a description, pick a preview, publish. Your voice works in TTS, the Realtime API, and every voice agent you build on Inworld.
Copyright © 2021-2026 Inworld AI
Voice Design API: Create a Custom AI Voice from a Text Description | Inworld AI