NLX × Inworld | Enabling multi-modal consumer experiences with SOTA voice AI

Listen to this article

0:00

Now you can build interactive voice applications that integrate seamlessly with digital channels, using Inworld’s state-of-the-art voice AI. Inworld’s text-to-speech (TTS) delivers studio quality and real-time latency, 75% cheaper than other providers. You can learn more about Inworld’s TTS models here.

Why Inworld + NLX for Voice+ AI

Consumers want to be able to interact with voice experiences in a way that feels authentic - whether that means through more expressive conversations or different modes of engagement. You can now access Inworld’s pre-built voices or clone your own voice to build interactive, voice-first applications using NLX’s no-code platform. Now, everyone from individual developers to leading brands can create sophisticated multimodal experiences to engage customers with a distinct personality, in the most commonly spoken languages for consumer applications.

Conversations that differentiate brands: Build, deploy, and analyze voice applications that solve for any use case in any industry with NLX, including contact center automations, AI assistants, and integrations with the most common digital channels (e.g., messaging apps, voice assistants).
Low latency: Inworld voices support ~200ms latency to the first audio chunk, enabling engaging use cases for consumer applications in entertainment, hospitality, travel, retail, and more.
Multimodal made easy: Voice and digital channels operate in real-time synchrony with NLX patented Voice+ technology, creating a natural and seamless conversational experience most like talking to a human.
Multilingual: Build agents in 11 of the most common languages for consumer applications, including English (with its various accents), Chinese, Korean, Dutch, French, Spanish, and more.
Scale affordably: SOTA-quality voices for just $15/1M characters (TTS 1.5-Mini), 75% cheaper than other providers, so you can build interactive experiences that scale and evolve with users' preferences and behaviors.
Zero-shot voice cloning: Leverage Inworld's voice cloning capabilities to bring characters, brands, and assistants to life with emotion and personality using just 5-15 seconds of audio.

How it works

You have two options to get started…

Select Inworld as your preferred TTS provider within the NLX platform ‘Integrations’ and configure your voice preferences.
Use Inworld’s API or embed Inworld voices via NLX’s voice gateway to enrich any step of the customer journey in your application. You can learn how here: NLX Integration Documentation

Interested in learning more about Realtime TTS?

Inworld Text-to-Speech Overview

Inworld Text-to-Speech Documentation

Inworld x NLX collaboration

Inworld and NLX share the belief that it should be easier for developers to build high-quality, interactive consumer applications. It’s never been easier to build a concierge for a luxury hotel, a virtual support agent, a voice-powered onboarding flow or whatever you can imagine that brings your brand or idea to life.

“Making quality text-to-speech accessible to developers was a critical breakthrough to unleashing the potential of consumer AI applications. Expressive voice is a key ingredient in embodying the personality of a brand and making interactions with customers more personal and engaging.” Jean Wang, Inworld Head of Product.

“Voice and voice-led multimodal experiences aren't just the future of AI; they're the pathway to truly natural consumer interactions,” said Derrick Bradley, Chief Product Officer at NLX. “Our partnership with Inworld gives NLX builders and enterprises easy access to industry-leading voice technology, driving speed to value when it counts most."

Inworld Runtime is here — the first AI runtime engineered to scale consumer applications. It lets developers go from prototype to production faster and supports user growth from 10 to 10M users with minimal code changes. By automating AI operations, Inworld Runtime frees up engineering resources for new product development and provides the tools to design and deploy no-code experiments. Our current partners — major media companies, AAA studios and AI-native startups — are already leveraging Runtime as the foundation of their AI stacks for their next generation of real-time, multi-million-user AI features and experiences.

Built to accelerate our internal development, now enabling all consumer builders

Coming from Google and DeepMind, our founding team recognized the momentum AI had flowing into business automation and professional-facing applications, while consumer surfaces lagged behind. So we built Inworld to help the benefits of AI reach everyone, everywhere, by enabling the next generation of consumer applications. We began with AI agents for gaming and media partners including Xbox, Disney, NVIDIA, Niantic, and NBCUniversal.

To accelerate this work, we built Runtime as internal infrastructure to handle the unique demands of consumer AI: maintaining real-time performance at multi-million concurrent user scale, user-specific quality expectations focused on engagement, and costs well under a cent per user per day. As companies from health/fitness, learning, and social applications began approaching us, we discovered these companies faced the exact same challenges that Runtime was already solving internally and decided to release it publicly.

“We built Runtime because existing tools couldn't deliver at the speed and scale our partners required. When we realized every consumer AI company faces these same barriers, we knew we had to open up what we'd built. Thousands of builders are hitting the same scaling wall we did, so we hustled over the past year to create the universal backend to accelerate the entire consumer AI ecosystem.”
Evgenii Shingarev, VP of Engineering

We learned the three factors determining leaders in consumer AI

Through four years of deploying consumer AI applications, we discovered three critical factors that determine success or failure. Excellence in all three is mandatory. Weakness in any one will prevent a consumer AI feature or application from achieving market leadership:

A graph illustrating the difference between a prototype's performance and the actual performance of most apps after launch, over a timeline of 6 months.

Time from prototype to production: While creating an AI demo takes hours, reaching production-readiness typically requires 6+ months of infrastructure and quality improvement work. Teams must handle provider outages, implement fallbacks, manage rate limits, provision and accelerate compute capacity, optimize costs, and ensure consistent quality. In building with category leaders, we saw how most consumer AI projects either make the leap or they stall out and die in the gap between prototype and scalable reality.
Resource allocation to new product development: Most engineering teams spend over 60% of their time on maintenance tasks: debugging provider changes, managing model updates, handling scale issues, and optimizing costs. This leaves minimal resources for building new features, causing products to stagnate while competitors advance. We experienced this firsthand, as even innovative teams get trapped in maintenance cycles instead of building what users want next.

A strategic matrix demonstrating 'Update Velocity' based on feature output and release frequency, identifying four key product management archetypes: Leaders, Steady State, Survivors, and Prototypers.

Experimentation velocity: Consumer preferences continuously evolve, but traditional deployment cycles of 2–4 weeks cannot match this pace. Teams need to test dozens of variations, measure real user impact, and scale winners — all without the friction of code deployments and app store approvals. Working with partners across the industry showed us that the fastest learner wins, but existing infrastructure makes rapid iteration nearly impossible.

“We scaled from prototype to 1 million users in 19 days with over 20× cost reduction”
Fai, Status CEO

Inworld Runtime’s technical design

Inworld Runtime delivers these capabilities through multiple innovations, including:

Adaptive Graphs: A C++ based graph execution system that solves scaling cross-platform limitations faced by most AI frameworks, with SDKs for Node.js, Python and others. Developers compose applications using pre-optimized nodes as building blocks (with APIs from top providers for LLM, TTS, STT, knowledge, memory and much more) that handle low-level integration work and automatically optimize data streams between components. The same graph seamlessly scales from 10 test users to 10 million concurrent users with minimal code changes and managed endpoints. With vibe-coding friendly interfaces this directly enables the leap from prototype to production in days, not months.

Visual documentation of an AI system, showing usage growth, the architectural flowchart, and corresponding experiment configuration code.

Automated MLOps: Beyond basic operations, Runtime provides self-contained infrastructure automation with integrated telemetry capturing logs, traces, and metrics across every interaction. Actionable insights, such as identifying bugs, user patterns, and optimization opportunities, are surfaced through the Portal, our observability and experiment management platform. Runtime performs automatic failover between providers, manages capacity across models, and handles rate limiting intelligently. It also supports custom on-premise deployments with optimized model hosting for enterprises. As applications scale, we provide access to all necessary cloud infrastructure to train, tune, and host custom models that break the cost-quality frontier of default models.

Overview of an Automated ML Operations system displaying the end-to-end architecture, performance analytics, and diagnostic traces for an application's workflow.

Live Experiments: One-click to deploy or scale experiments. Configuration separated from code enables instant A/B tests without deployment friction. Runtime can automatically run hundreds of experiments simultaneously by defining variants via SDK and managing tests through the Portal, testing different models, prompts, graph configurations, and logic flows. Changes deploy in seconds with automatic impact measurement on user metrics.

A comprehensive dashboard for managing live A/B experiments, combining code configuration, traffic distribution settings, and performance evaluation graphs.

Proven results from early adopters of Inworld Runtime

Runtime deployments demonstrate consistent technical achievements:

Our largest partners (major IP owners, media companies and AAA studios) are already leveraging Runtime as the foundation of their AI stacks
Wishroll scaled from prototype to 1 million users in 19 days with over 95% cost reduction
Little Umbrella is able to ship new AI games while they use Inworld to reduce update and maintenance effort for existing titles
Streamlabs built a multimodal real-time streaming assistant with features that were unfeasible even six months ago
Bible Chat upgraded and scaled their voice features while reducing voice costs by 85%
Nanobit delivers personalized AI narratives to millions at sustainable unit economics

Availability and pricing

Developers can get started immediately by downloading Runtime SDKs at inworld.ai/runtime, with comprehensive documentation and migration guides. Runtime works natively with code assistants like Cursor, Claude Code, Google CLI, Windsurf, and Zencoder (see guide). Start with your own project or use our templates and demo apps as inspiration. Runtime deploys flexibly in client applications, on any cloud provider's servers, or through custom on-premise installations with Inworld-managed model hosting. Once in production, the Portal can be used for observability and rapid experimentation.

Runtime pricing is entirely usage-based with no upfront costs. Developers can experiment with all models and capabilities and only pay for what scales successfully, ensuring alignment with consumer application economics where costs must remain sustainable as usage grows. With access to state-of-the-art models from Anthropic, Google, Mistral, and OpenAI, developers have maximum choice to easily test and select the optimal model for their use case. Runtime also provides access to top open-source models like Deepseek, Llama, and Qwen through lightning-fast providers Fireworks AI, Groq and Tenstorrent, or directly from Inworld on hardware powered by Modular. Developers with existing Microsoft or Google relationships can utilize their cloud commitments to access Runtime through the Azure Marketplace and Google Cloud Marketplace.

About Inworld AI

At Inworld, we’re building out the AI Runtime for consumer applications. We were founded in 2021 and have raised over $120M from investors including Lightspeed Venture Partners, Kleiner Perkins, S32, Founders Fund, Stanford University, Bitkraft, and Microsoft’s M12.

We will host the inaugural Consumer AI Summit in Spring 2026 in San Francisco, bringing together technical leaders building and scaling the next generation of consumer AI applications.

Enabling multi-modal consumer experiences with SOTA voice AI