Introducing Inworld TTS
Author
Published on
June 25, 2025

Today, we are launching Inworld TTS - a new generation of text-to-speech models that deliver cutting-edge quality and latency for the most accessible price on the market. Our flagship model Inworld TTS-1 offers realistic, context-aware speech synthesis and precise zero-shot voice cloning, outperforming comparable solutions from leading labs.
Inworld TTS-1 is available today via API and can be experienced in the TTS Playground, where you can test pre-built voices or clone your own from a short audio sample.
We are also releasing Inworld TTS-1-Max, a larger, more expressive model, as experimental.
Powering the Next Generation of AI Applications
For too long, developers have faced a false choice: use high-quality, expressive speech that is slow and expensive, or settle for affordable solutions that lack realism. Our goal was to eliminate this trade-off and build the voice layer for the next generation of consumer AI applications. Here’s what makes TTS-1 different.
- Unmatched quality. Inworld TTS delivers rich, emotionally nuanced speech virtually indistinguishable from human speaking. It captures subtle nuances in tone and prosody, making interactions feel natural and engaging. This power is now at your fingertips in 11 languages with Inworld TTS-1 and TTS-1-Max [1]. We’re also releasing a research preview of audio markups, such as [happy] or [whispering], which give users a new level of control over how the model speaks, not just what it says.
[2]
- Blazing-fast for real-time interactions. With the first 2-second audio chunk ready in as few as 200ms[3], Inworld TTS-1 is built for real-time applications. The model is already available through popular AI voice platforms like LiveKit and Vapi, with additional integrations coming soon, and can power everything from educational companions and fitness trainers, to shopping assistants and open world games. The development and technical achievements of Inworld’s TTS-1 were accelerated by partners like Modular and Lightning AI. We’ll be sharing more about each of these partnerships and use cases in the coming weeks.
[4]
- Radically affordable for every developer. State-of-the-art AI should not be a luxury. We’ve optimized our entire stack to offer Inworld TTS-1 at a disruptive price of $5 per 1 million characters. On top of that, we’ve made our powerful zero-shot voice cloning free for all users. Now, every developer and team, from indie hacker to enterprise, can integrate production-grade voice AI into their products without breaking the budget.
[5]
We are excited to see how developers across all verticals will leverage our tech to build experiences we haven't even imagined.
A Commitment to Open Innovation
We believe that transparency and community collaboration are the catalysts for true progress. In that spirit, we are making our research accessible to all. In the coming weeks, we will publish a detailed technical report on Inworld TTS’s architecture and training methodology.
Furthermore, we will open source our ready-to-use training repository on GitHub under a commercially permissive license. This will provide a step-by-step guide to recreating our work, from SpeechLM pre-training to SFT and RLHF, empowering researchers and developers to build upon our foundation.
This is just the beginning. We’ll be working on continuously improving models’ quality and affordability. This TTS architecture has proven to be an incredibly flexible framework, and we are already experimenting with new capabilities, such as creating voices from their natural language descriptions, which we plan to release later this year.
Trust & Safety
Powerful technology demands profound responsibility. We are committed to ensuring our voice generation technology is used safely and ethically.
- All synthesized audio from our platform contains an imperceptible watermark to ensure it can be identified as AI-generated.
- We have implemented robust safeguards to prevent the cloning of voices without explicit consent.
- We actively prohibit and will act against any uses that violate our policies, such as malicious impersonation or fraudulent activity.
We are dedicated to collaborating with the broader research community to advance safety standards for all voice AI.
How to Get Started
Experience the Inworld TTS difference today:
- Try the TTS Playground to hear the quality for yourself.
- Clone your voice instantly with just a few seconds of audio.
- Read the API Docs and start building now.
For even higher fidelity, fine-tuned voice clones and customized enterprise plans for high-volume use cases, please reach out to our team for more information.
Let's Build Together
Your feedback is crucial as we refine and expand our TTS capabilities. If you have suggestions or encounter issues, please share them with our team via the feedback form in the Inworld Portal. We can't wait to see what you build.
Appendix