01.21.2026 · Product Updates

Inworld TTS-1.5: Upgrading the #1 Ranked TTS Model with Production-Grade Latency, Expression and Stability

We’re releasing Inworld TTS-1.5, the fastest, highest-quality realtime voice AI models available. With time-to-first-audio P90 latency of <250ms for 1.5 Max and <130ms for 1.5 Mini (4x faster than prior generations) and top rankings on independent leaderboards, this release sets a new standard for developers building voice-enabled applications at scale. TTS-1.5 improves on the Inworld models already #1 on leaderboards with 30% greater expressiveness, 40% reduction in word error rates, and enhanced multilingual support. It is also more than 25x lower cost than alternatives. Whether you're powering conversational AI agents, live translation, or interactive media experiences, TTS 1.5 gives you the world’s best text-to-speech without compromise.
Inworld TTS-1.5 Max is recommended for most applications, while TTS-1.5 Mini is optimized for hyper-latency sensitive applications.

Production-grade realtime latency: Professional voice actor quality at human-native speeds

For realtime applications latency isn't just a metric. It's the difference between a natural conversation and an awkward delay. TTS-1.5 delivers breakthrough speed improvements that unlock new categories of realtime experiences.
Our new TTS-1.5 models achieve time-to-first-audio P90 latency under 250ms for our Max model and under 130ms for our Mini model. This is a 4x improvement from prior generations. The Max model now delivers quality previously only achievable at much higher latencies, running nearly as fast as the Mini model while producing richer, more expressive speech.

Engagement-optimized quality: Upgrade every user experience with leading expression and stability

Speed means nothing if quality suffers. TTS 1.5 delivers both. Our models rank #1 on the Artificial Analysis TTS Leaderboard.What makes this ranking particularly meaningful is that it reflects blind comparisons by thousands of real users evaluating which outputs sound more natural and human. When developers and end users consistently choose Inworld TTS over alternatives, that's validation that matters.
Beyond third-party validation, TTS 1.5 is 30% more expressive than prior generations and demonstrates a 40% reduction in word error rate, reducing hallucinations, cutoffs, and artifacts. The result is speech that's virtually indistinguishable from human speaking: emotionally nuanced, contextually aware, and reliably accurate.
An expanded range of expression means support for many new consumer-facing use cases and applications, where the personality of voice really matters to engage, retain and convert every user. Ultimately meaning better business outcomes across the next wave of AI applications.


Unlocked consumer-scale: Enhanced multilingual support, still 25x lower cost than alternatives

State-of-the-art voice AI should be accessible to every developer, from indie hackers building their first voice app to enterprises scaling to millions of users.
Language support now spans 15 languages, with the addition of Hindi and expanded coverage across major world languages. Combined with on-prem deployment options, TTS 1.5 serves global enterprises with diverse requirements for data residency, compliance, and customization.
Most significantly, TTS 1.5 remains 25x more affordable than the next best model, a gap that's only widened as competitors have raised prices. At $0.005 per minute for 1.5 Mini and $0.01 per minute for 1.5 Max, we're keeping our commitment to radically accessible pricing that doesn't force developers to choose between quality and budget.


What Inworld TTS 1.5 unlocks: use-case inspiration

Bible Chat, Particle, Luvu, Talkpal, Astrobeam and many others are proving what is possible when developers have access to consumer-grade voice AI. The combination of sub-200ms latency, benchmark-leading quality, and accessible pricing opens new possibilities:
Conversational AI agents: Build voice assistants that respond naturally, without the awkward pauses that break immersion. TTS-1.5's speed makes multi-turn conversations feel genuinely fluid.
Real-time translation and dubbing: Live interpretation requires voice synthesis that keeps pace with speakers. TTS-1.5 delivers the latency profile that makes real-time language bridging viable at scale.
Interactive entertainment: From AI companions to narrative experiences, TTS-1.5 enables characters that speak with emotional range and contextual awareness, responding in real-time to user input.
Accessibility applications: Screen readers, navigation aids, and assistive technologies benefit from natural-sounding speech that doesn't fatigue listeners or create cognitive load.
We're blown away with Inworld’s latest models which achieve unmatched voice realism at a fraction of the cost. We’re excited to bring these models to Layercode where developers can create and deploy realtime latency, life-like voice agents with them.
Damien Tanner, CEO, Layercode


Enterprise-ready deployment options, now supporting On-Prem

TTS 1.5 supports the deployment flexibility enterprises require:
  • Cloud API: Immediate access via our standard API with global availability
  • On-prem deployment: Full model hosting on your infrastructure
  • Custom solutions: Contact our enterprise team for volume pricing, SLAs, and tailored deployment architectures
For organizations with strict data residency requirements or regulatory constraints, on-premise deployment provides complete control over voice synthesis without sacrificing capability.
TTS 1.5 is also available now via Layercode, LiveKit, NLX, Pipecat, Stream Vision Agents, Ultravox, Vapi, and Voximplant.
I see an inflection in the not-so-distant future where conversational voice becomes the primary interface. The exciting thing is a lot of the technologies, like Inworld's realtime TTS, that need to come together to make this a reality are already here. And they're only getting better. So it's super exciting to operate in this space with partners like Inworld setting the pace for this innovation.
Andrei Papancea, CEO & Co-Founder, NLX

Get started today

TTS 1.5 is available now:
  1. Try the TTS Playground: Hear TTS 1.5 in action with your own text or clone with a voice sample
  2. Read the documentation: API reference, SDKs, and integration guides
  3. Contact enterprise sales: Volume pricing, on-premise options, custom voice development
We're just getting started. TTS 1.5 represents our most significant voice AI release yet, and the foundation for what's coming next. We can't wait to see what you build.
Questions? Reach out to our team today

Copyright © 2021-2026 Inworld AI