Introducing Inworld TTS
State-of-the-art and radically affordable Voice AI for every developer.
Today, we are launching Inworld TTS - a new generation of
text-to-speech models that deliver cutting-edge quality and latency
for the
most accessible price on the market. Our flagship model Inworld TTS-1 offers realistic, context-aware speech
synthesis and precise zero-shot voice cloning, outperforming comparable solutions from leading
labs.
Inworld TTS-1 is
available today via API and
can be experienced in the TTS
Playground, where you can test pre-built voices or
clone your own from a
short audio sample.
We are also releasing Inworld TTS-1-Max, a larger, more expressive model, as
experimental.
Powering the Next Generation of AI Applications
For too long, developers have faced a false choice: use high-quality, expressive speech that
is slow and expensive, or settle for affordable solutions that lack realism. Our goal was to eliminate this
trade-off and build the voice layer for the next generation of
consumer AI applications. Here’s what makes TTS-1 different.
- Unmatched quality. Inworld TTS
delivers rich, emotionally nuanced speech virtually indistinguishable from human speaking. It captures
subtle nuances in tone and prosody, making interactions feel natural and engaging. This power is now at
your fingertips in 11 languages with Inworld TTS-1 and TTS-1-Max . We’re also releasing a
research preview of audio markups, such as [happy] or [whispering], which give users a new level of
control over how the model speaks,
not just what it says.
- Blazing-fast for real-time
interactions. With the first 2-second audio chunk
ready
in as few as 200ms , Inworld TTS-1 is built
for real-time applications. The model is already available through
popular AI voice platforms like LiveKit
and
Vapi,
with additional integrations
coming soon, and can power everything from educational
companions and
fitness trainers,
to shopping assistants and open world games.
The development and technical
achievements of Inworld’s
TTS-1 were accelerated
by
partners like Modular and Lightning AI. We’ll be sharing more about
each
of these partnerships and use cases in the coming weeks.
- Radically affordable
for every developer.
State-of-the-art AI should
not be a luxury. We’ve optimized our entire stack
to offer
Inworld TTS-1 at a disruptive
price of $5 per 1 million characters. On
top of that, we’ve made our powerful zero-shot voice cloning free for all
users. Now, every
developer and team, from indie hacker to enterprise, can integrate
production-grade voice AI into their products without breaking the
budget.
We are excited to see how developers across all verticals will leverage our tech
to build experiences we haven't even imagined.
A Commitment to Open Innovation
We believe that transparency and community collaboration
are the catalysts for
true progress. In that spirit, we are making our research accessible to all.
In the coming weeks, we will
publish a detailed technical report on Inworld TTS’s architecture and training methodology.
Furthermore, we will open source our ready-to-use
training repository on GitHub under a
commercially permissive license. This will provide a step-by-step
guide to recreating our work,
from SpeechLM pre-training to SFT and RLHF, empowering
researchers and developers to build upon our foundation.
This is just the beginning. We’ll be working on continuously improving
models’ quality and affordability.
This TTS architecture has proven to be an incredibly flexible
framework, and we are already experimenting with new capabilities, such as creating voices from their
natural language descriptions,
which we plan to release later this year.
Trust & Safety
Powerful technology demands profound responsibility. We are committed to ensuring
our voice generation technology is used safely and ethically.
- All synthesized audio from our platform contains an imperceptible watermark
to ensure it can be identified
as AI-generated.
- We have implemented robust safeguards to prevent the
cloning of voices
without explicit consent.
- We actively prohibit and will act
against any uses that violate
our policies, such as malicious impersonation
or fraudulent activity.
We are dedicated to collaborating with
the broader research community to advance
safety standards for all voice AI.
How to Get Started
Experience the Inworld TTS difference today:
For even higher fidelity,
fine-tuned voice clones and customized enterprise
plans for high-volume use cases, please reach out to our team for more
information.
Let's Build Together
Your feedback is crucial as we refine and expand our TTS capabilities. If you have
suggestions or encounter issues, please share them with our team via the feedback
form in the Inworld Portal.
We can't wait to see
what you build.
Appendix