
curl -X POST https://api.inworld.ai/tts/v1/voice:stream \
-H "Authorization: Basic $INWORLD_API_KEY" \
-H "Content-Type: application/json" \
-d '{ "text": "Hi! What can I help you with today?",
"voice_id": "Clive",
"model_id": "inworld-tts-1.5-max", "audio_config": {
"audio_encoding": "OGG_OPUS",
"sample_rate_hertz": 16000
}
}'































































Type or paste text, pick a voice, hear it instantly.
3 of the top 5 models on Artificial Analysis are Inworld. Blind tests by thousands of real users, not internal evals. TTS-1.5 Max delivers over 30% more expressiveness than previous models, with optimized stability to eliminate hallucinations and artifacts.
Test quality in Playground3 of the top 5 models on Artificial Analysis are Inworld. Blind tests by thousands of real users, not internal evals. TTS-1.5 Max delivers over 30% more expressiveness than previous models, with optimized stability to eliminate hallucinations and artifacts.
Test quality in PlaygroundCreate custom voices instantly from 15 seconds of audio or a text description. Fine-tune with professional voice cloning for maximum fidelity. All methods produce production-ready voices you can use in the Playground or via API.


Create custom voices instantly from 15 seconds of audio or a text description. Fine-tune with professional voice cloning for maximum fidelity. All methods produce production-ready voices you can use in the Playground or via API.
Built for realtime from the ground up — audio generates the instant it's synthesized via WebSocket. No buffering delay. Comparable latency to competitors at a fraction of the cost.
Built for realtime from the ground up — audio generates the instant it's synthesized via WebSocket. No buffering delay. Comparable latency to competitors at a fraction of the cost.
English, Spanish, French, Korean, Chinese, Hindi, Japanese, German, and more. Native-speaker quality in every language with cross-lingual cloning. Deploy globally without separate pipelines.
Explore voices

English, Spanish, French, Korean, Chinese, Hindi, Japanese, German, and more. Native-speaker quality in every language with cross-lingual cloning. Deploy globally without separate pipelines.
Explore voicesTTS-1.5 Mini starts at $15/million characters. TTS-1.5 Max at $30/million. The next best option is over $150. Scale to millions of users without scale-related cost anxiety.
View pricingTTS-1.5 Mini starts at $15/million characters. TTS-1.5 Max at $30/million. The next best option is over $150. Scale to millions of users without scale-related cost anxiety.
View pricing
Integrate Inworld TTS streaming into my app for real-time audio playback.
Read the linked documentation below before generating code — it contains the complete API specification, authentication details, and working examples.
## Configuration
- Model: inworld-tts-1.5-max
- Voice: Clive
- Audio output: LINEAR16 streamed as NDJSON chunks
- Speaking rate: 1
## What to build
- POST to https://api.inworld.ai/tts/v1/voice:stream
- Read the NDJSON response stream — each line contains { "result": { "audioContent": "<base64>" } }
- Decode and play each audio chunk as it arrives for low-latency playback
## Authentication
Pass the API key in the Authorization header:
Authorization: Basic <YOUR_API_KEY>
IMPORTANT: Do NOT hardcode the API key. Store it in an environment variable.
## Reference documentation
- TTS Overview: https://docs.inworld.ai/tts/tts
- HTTP Streaming API: https://docs.inworld.ai/api-reference/ttsAPI/texttospeech/synthesize-speech-stream
- WebSocket Streaming API: https://docs.inworld.ai/api-reference/ttsAPI/texttospeech/synthesize-speech-websocket
## MCP Server
For full API context in your coding agent, add: https://docs.inworld.ai/mcp