Blog Post · Pricing Update
A consumer app pays for AI on every session, but most of its users never pay, and the ones who do spend only a few dollars a month. That math puts a wall in front of every consumer product, and it decides whether you scale or stall. We are taking that wall down for the people who serve everyone, by cutting prices in half or more for most developers across the whole stack. That includes text-to-speech, speech-to-text, the LLM, and compute, with prices dropping further as you scale.
Here is the problem and the fix in one picture: on a flat rate your bill climbs in lockstep with engagement, so the more people use your app the harder cost pushes back.
Lower the unit cost at every tier and the whole spend curve bends down with it, so growth stops being the thing that breaks your economics.
A consumer app's costs start climbing the moment it works. The three: it has to scale, the people it serves are the buyers, and most of its cost is the AI itself. Each is hard on its own, and all three get harder because cost rises with the very engagement the product is built to create.
Scale is a necessity
When an app takes off, spending does not rise in proportion, because one line of the bill grows faster than every other cost you carry, and faster than anything a small team can offset.
Running the AI is that runaway line, so cutting its cost is the difference between scale that compounds your margin and scale that quietly drains it, which raises the obvious question of who is even paying.
Consumers are the buyers
A consumer app and an enterprise tool can look identical on screen, yet they run on opposite economics, and the split shows up the moment you ask who actually sends money.
A small minority of payers, each worth a sliver of one enterprise seat, has to fund the compute every free session burns, so the price of each unit is what quietly decides survival.
AI is most of the cost
You already know this one because you pay it every month: hunting for slightly cheaper tokens does not save you when usage grows faster than the discount; the bill tracks how much people engage.
Volume discounts soften the slope without ever flattening it, since growing means using more, so the only lever that holds is a structurally lower cost. Here is what happens when teams get one.
None of these problems vanish with a cheaper bill, but every one of them gets easier. Scale becomes affordable, more markets come into reach, and a small team can run a real product without infrastructure eating the margin. Teams already building on Inworld show what changes, and the reason the lever works is that the cost lives almost entirely in three places.
~85%
lower on TTS cost
Bible Chat
800K+ daily users

~40%
lower on TTS cost
Talkpal
10M+ learners
"We scaled from prototype to 1 million users in 19 days with over 20× cost reduction."
Fai Nur · CEO, Status
"Inworld reduced our TTS costs by about 10x, while remaining neutral or better on all the metrics we care about, which is pretty incredible."
Creston Brooks · Co-founder & CTO, Luvu
"We chose Inworld because of its low latency, high-quality output, multilingual support and competitive pricing."
Dimitri Dekanozishvili · Co-Founder, Talkpal
The biggest of these cuts is not a discount: Wishroll split one giant prompt into small, task-tuned models through the Router, and engagement held. The full mechanics are in the Router section below. Beyond these, some of the largest consumer apps we serve, including one processing more than 600 billion tokens a day, are moving their LLM workloads onto our realtime inference. Customer figures are self-reported.
Follow the money
The biggest labs make most of their money from businesses and developers, and it's not too hard to see why. Enterprises pay for productivity: a company will spend a hundred to a few hundred dollars per seat to save an employee time. On the other hand, only about 3% of consumer AI users ever pay, and those who do pay five to twenty dollars a month for something they enjoy.
So the models were built for that first buyer. The voice infrastructure especially was priced for enterprise and tuned for narrow jobs like support calls, IVR, and transcription, not for the range of things consumers actually do. And for a consumer, almost the entire cost of a session is the AI itself, split across the LLM, text-to-speech, and speech-to-text. Pricing that works for a per-seat tool doesn't work for an app that pays for every minute someone uses it.
Follow the money inside one live conversation and nearly every cent lands in the same place: the models talking, listening, and thinking, second by second, for everyone you are lucky enough to keep engaged.
Whoever owns the LLM, the text-to-speech, and the speech-to-text owns almost the whole cost of a session, which means they, not you, decide whether an app built for the full breadth of human experience can afford to grow.
Why we are doing this
We are a research lab, and we measure this technology by the breadth of human experience it reaches, not a handful of business tasks. If only enterprise economics work, only enterprise problems get built. We would rather a wider range of developers serve a wider range of people: the companion at midnight, the tutor, the coach, the daily check-in.
We can lower the wall because our research delivers both quality and efficiency, and we own the layers where cost adds up. We pass those savings on instead of marking them up.
Enterprise voice covers one narrow band of what people actually do with their voice. Here is the whole spectrum a consumer developer has to serve.
That whole curve is who this price cut is for.
What we did
New pricing is live today for every developer. For most developers it drops by half or more across every layer of a voice app, and it falls further as you scale. Here is each layer, what it does for a consumer app, and where it lands.
Text-to-speech
Top-ranked voice, a fraction of the cost
Realtime TTS-2 is top-ranked on independent speech benchmarks. It supports steering and conversational context, and a single voice works across languages, so you can open new markets without switching models. It answers in realtime, and it costs a fraction of the comparable premium voices.
Down to ~$10 / 1M charsSpeech-to-text
Fast, accurate, among the lowest cost
Our speech-to-text ranks top-4 in quality and reads emotion, accent, and intent as the user speaks, so the app responds to a person. It handles long, continuous sessions that call-center transcribers can't, and it now costs less than every major streaming rate we benchmark.
Down to ~$0.10 / hourThe LLM, through the Router
Hundreds of models, no markup
The LLM is a major line. Through the Router you reach 220+ models from every major lab and route to the best one for each job with a config change, not a migration. Routed third-party models carry no markup, where a typical gateway adds about 5%, which compounds into real money at scale. And the biggest wins come from decomposition: we helped one social app split one giant prompt into small, task-tuned models, cutting AI cost about 95% while engagement held.
No markup · up to ~95% lowerRealtime inference, a new product
Optimized open models, up to 50% below the public rate
Realtime inference is a new product. The same team that tuned our voice models now hosts the top open models consumer apps run in production and serves them up to 50% below the public rate, with better latency and reliability, accessed through the Router. Three of the ten highest-volume consumer apps we work with are already moving their LLM workloads onto it.
Up to 50% off, better latencyCompute
Fixed GPUs when you reach real scale
At the largest scale, per-token billing is not always the right shape. You can move to dedicated GPUs from $5 per GPU-hour, less than half the on-demand rate of a hyperscaler, and transition at the crossover where fixed compute beats variable.
From $5 / GPU-hourAnd it compounds
Today's prices are the ceiling, not the floor. As your total spend grows, the unit price of every layer falls. We own the layers where cost adds up, and scale makes us more efficient, not less. Your unit price drops by tier, all the way to enterprise. Shown here for Realtime TTS-2; speech-to-text and the LLM follow the same shape.
You saw the shape of this curve at the top of the post. Here it is with your spend on it: the total grows far slower than on a flat rate. You choose how to buy, trading flexibility for a lower, more predictable rate, up to about 90% off with fixed compute for the largest workloads.
Discounts on top of a high base never fixed the math. Discounts on top of a structurally lower base do, and fixed compute is the point where the line finally flattens. Pick the shape that matches your stage; the rate only moves one way as you grow.
Human feel
Voice profiling reads emotion and intent as the user speaks, and steerable, audio-native TTS-2 answers in kind, so a session adapts to the person.
Modularity
Reach any model you need through one API, and swap it with a config change instead of a migration.
Unified billing and management
One bill across every layer, volume discounts that apply across the stack, and per-team and per-user controls.
Experiment freely
A/B test models and variants on live traffic to find what improves both the experience and the margin.
Our commitment
The price cut is the visible part. Underneath it is a stack we build and own end to end, where each layer rests on the one below. Read it from the surface down.
Layer 01 · APIs
Closest to the user
Simple, reliable APIs
Layer 02 · Inference
Where the price cut comes from
Optimized inference and model management
Layer 03 · Research
The bedrock
Research to serve the breadth of human experience
Voice profiling reads the user, and steering makes responses expressive and empathetic. Realtime latency is the bar for engaging a consumer, and one cross-lingual voice reaches users across languages and markets, with the control developers need made cost-effective.
Deep optimization applied to the top open models consumer apps run in production means better performance, cost, and reliability, with universal access through the Router and unified billing across every model.
Where others narrow their focus, we want to enable the diversity of users and use cases that consumer represents, across voice, language models, and code. The breadth is the point, and it is the layer everything above is built to serve.
No single layer is the moat. The integrated whole is.
New pricing is live today for every developer. We build the best realtime models for voice and speech, optimize the inference that serves the LLMs you run, and keep lowering the cost, so a wider diversity of developers can bring AI to everyone.
Questions
Realtime TTS-2 drops to about $10 per million characters at scale (TTS-1.5 Mini to about $5), speech-to-text falls to about $0.10 per hour, and LLMs run at cost through the Router. Every rate falls further as you scale, and most consumer apps land on enterprise tiers. Exact figures are at inworld.ai/pricing.
Pricing falls smoothly as your total spend grows, and it falls per layer, so the unit price of speech-to-text, the LLM, and text-to-speech each drops as you scale. You choose how to buy: pay-as-you-go, a subscription, a volume commit for a lower rate, or fixed compute for the largest workloads. Spend on one layer lowers the others, on one combined commit.
The Router reaches hundreds of models from OpenAI, Anthropic, Google, Mistral, Meta, DeepSeek and more. Routed third-party models carry no markup, and you can also route to Inworld's first-party realtime inference. It is OpenAI-SDK compatible, so you swap a model with a config change, not a migration.
First-party hosted, optimized versions of the top open models consumer apps run in production, accessed through the Router. The same team that optimized our voice models tuned the serving, so it runs up to 50% below the public third-party rate while improving latency and reliability.
No. Realtime TTS-2 is top-ranked on independent speech benchmarks and our speech-to-text ranks top-4, and that holds while the price falls. The efficiency comes from owning the layer and optimizing the inference, not from cutting corners. It is structural where we own the layer; routed third-party models carry no markup, but their underlying economics are the provider's.
Yes. For high-scale workloads you can move off per-token billing to dedicated GPUs from $5 per GPU-hour, and transition when the economics make sense for you.
Yes. For enterprise workloads we price match TTS models of comparable quality. Bring us your current bill and we will work through the numbers.
New pricing is live at inworld.ai/pricing for every developer, not gated to new accounts. Start free at inworld.ai, open a connection to the Realtime API, and the Router gives you unified billing, usage analytics, A/B testing, and cost control across every model.