Blog Post · Pricing Update

Cost is the wall in front of consumer AI,
and we are taking it down.

A consumer app pays for AI on every session, but most of its users never pay, and the ones who do spend only a few dollars a month. That math puts a wall in front of every consumer product, and it decides whether you scale or stall. We are taking that wall down for the people who serve everyone, by cutting prices in half or more for most developers across the whole stack. That includes text-to-speech, speech-to-text, the LLM, and compute, with prices dropping further as you scale.

Start building See pricing

Here is the problem and the fix in one picture: on a flat rate your bill climbs in lockstep with engagement, so the more people use your app the harder cost pushes back.

Monthly spend grows slower with Inworld

Total monthly cost as you scale

Other providers ($30/1M)Inworld (tiered)Your savings

Source: Illustrative, based on Inworld plan-tier pricing (Developer $15, Growth $12.50, Enterprise $10 per 1M characters) vs a flat $30/1M competitor rate.

Lower the unit cost at every tier and the whole spend curve bends down with it, so growth stops being the thing that breaks your economics.

Consumer AI is hard for three reasons. Cost is at the root of all of them.

A consumer app's costs start climbing the moment it works. The three: it has to scale, the people it serves are the buyers, and most of its cost is the AI itself. Each is hard on its own, and all three get harder because cost rises with the very engagement the product is built to create.

Scale is a necessity

Apps can reach millions of users in weeks.
Reliability is non-negotiable; an outage loses users and their habits.
A few people run infrastructure built for millions.
Most models were tuned for enterprise, not consumer.

When an app takes off, spending does not rise in proportion, because one line of the bill grows faster than every other cost you carry, and faster than anything a small team can offset.

Monthly spend as a consumer app scales. AI inference grows fastest.

Costs grow with every user

AI inference ($/mo)Paid acquisition ($/mo)Operations ($/mo)Daily users

Illustrative growth model for a scaling consumer AI app. Cost in $K / month; users in thousands of DAU.

As you scale, lower cost keeps your margin instead of eroding it · Illustrative

Margin vs scale

Without the price cutWith lower cost

Source: Illustrative gross-margin model at fixed price points. Both lines start at the same point to isolate the effect of scale.

Running the AI is that runaway line, so cutting its cost is the difference between scale that compounds your margin and scale that quietly drains it, which raises the obvious question of who is even paying.

Consumers are the buyers

Most users pay nothing; those who do pay $5 to $20 a month.
They expect a natural, realtime response, not a slow workflow.
Cost keeps most teams in English-only markets.
Acquisition is paid, so thin revenue has to cover serving everyone.

A consumer app and an enterprise tool can look identical on screen, yet they run on opposite economics, and the split shows up the moment you ask who actually sends money.

About 3 in 100 users ever pay

Who pays for consumer AI

Paying userFree user

~3%of users ever pay

Source: Menlo Ventures, 2025 State of Consumer AI

Revenue per user, per month

What a single user is worth

Consumer appEnterprise seat

Source: a16z, Menlo Ventures, 2025 to 2026

A small minority of payers, each worth a sliver of one enterprise seat, has to fund the compute every free session burns, so the price of each unit is what quietly decides survival.

AI is most of the cost

AI is the largest cost line, and it grows with every session.
Cheaper tokens do not help when you use far more of them.
Cutting unit cost means locking into a SKU and losing flexibility.
Many models and configs make billing its own problem.

You already know this one because you pay it every month: hunting for slightly cheaper tokens does not save you when usage grows faster than the discount; the bill tracks how much people engage.

Even after the volume discounts you earn at scale

Your AI bill grows with every user

List rateWith volume discounts

Illustrative app-spend model; total monthly AI inference cost by daily active users, list rate vs a typical scaling volume discount.

Volume discounts soften the slope without ever flattening it, since growing means using more, so the only lever that holds is a structurally lower cost. Here is what happens when teams get one.

Lower the cost, and each of these eases.

None of these problems vanish with a cheaper bill, but every one of them gets easier. Scale becomes affordable, more markets come into reach, and a small team can run a real product without infrastructure eating the margin. Teams already building on Inworld show what changes, and the reason the lever works is that the cost lives almost entirely in three places.

~95%

lower on LLM cost

Wishroll / Status

500K+ daily users

Bible Chat

~85%

lower on TTS cost

Bible Chat

800K+ daily users

~40%

lower on TTS cost

Talkpal

10M+ learners

250K+

users served at consumer cost

Luvu

realtime AI coaching

"We scaled from prototype to 1 million users in 19 days with over 20× cost reduction."

Fai Nur · CEO, Status

"Inworld reduced our TTS costs by about 10x, while remaining neutral or better on all the metrics we care about, which is pretty incredible."

Creston Brooks · Co-founder & CTO, Luvu

"We chose Inworld because of its low latency, high-quality output, multilingual support and competitive pricing."

Dimitri Dekanozishvili · Co-Founder, Talkpal

The biggest of these cuts is not a discount: Wishroll split one giant prompt into small, task-tuned models through the Router, and engagement held. The full mechanics are in the Router section below. Beyond these, some of the largest consumer apps we serve, including one processing more than 600 billion tokens a day, are moving their LLM workloads onto our realtime inference. Customer figures are self-reported.

Follow the money

The voice AI stack was built for the enterprise. Consumer was left to carry the cost.

The biggest labs make most of their money from businesses and developers, and it's not too hard to see why. Enterprises pay for productivity: a company will spend a hundred to a few hundred dollars per seat to save an employee time. On the other hand, only about 3% of consumer AI users ever pay, and those who do pay five to twenty dollars a month for something they enjoy.

So the models were built for that first buyer. The voice infrastructure especially was priced for enterprise and tuned for narrow jobs like support calls, IVR, and transcription, not for the range of things consumers actually do. And for a consumer, almost the entire cost of a session is the AI itself, split across the LLM, text-to-speech, and speech-to-text. Pricing that works for a per-seat tool doesn't work for an app that pays for every minute someone uses it.

Follow the money inside one live conversation and nearly every cent lands in the same place: the models talking, listening, and thinking, second by second, for everyone you are lucky enough to keep engaged.

What a live session's serving cost is made of

Most of the cost is the AI itself

LLMText-to-speechSpeech-to-text

Illustrative cost mix for a typical conversational, voice AI session. The whole serving cost is AI inference, split across three layers.

Whoever owns the LLM, the text-to-speech, and the speech-to-text owns almost the whole cost of a session, which means they, not you, decide whether an app built for the full breadth of human experience can afford to grow.

Why we are doing this

AI should reach everyone, not only the enterprise.

We are a research lab, and we measure this technology by the breadth of human experience it reaches, not a handful of business tasks. If only enterprise economics work, only enterprise problems get built. We would rather a wider range of developers serve a wider range of people: the companion at midnight, the tutor, the coach, the daily check-in.

We can lower the wall because our research delivers both quality and efficiency, and we own the layers where cost adds up. We pass those savings on instead of marking them up.

Enterprise voice covers one narrow band of what people actually do with their voice. Here is the whole spectrum a consumer developer has to serve.

Enterprise voice is tuned for a narrow set of jobs, even where its volume is large

AI should reach the breadth of human experience

The full breadth (consumer)Enterprise voice (one slice)

Illustrative

That whole curve is who this price cut is for.

What we did

Our API prices drop by half or more for most developers, across the whole stack.

New pricing is live today for every developer. For most developers it drops by half or more across every layer of a voice app, and it falls further as you scale. Here is each layer, what it does for a consumer app, and where it lands.

$ per 1M characters · Inworld shown on the Growth plan ($25 to $12.50, ~50% off)

Text-to-speech price vs the market

Other providersInworld

Provider API rates, June 2026. Inworld on the Growth plan; $10 at enterprise scale. *Gemini 3.1 Flash TTS is billed by audio output tokens; ~$180/1M is the effective rate on a typical query. †Cartesia estimated from published tier pricing.

Text-to-speech

Top-ranked voice, a fraction of the cost

Realtime TTS-2 is top-ranked on independent speech benchmarks. It supports steering and conversational context, and a single voice works across languages, so you can open new markets without switching models. It answers in realtime, and it costs a fraction of the comparable premium voices.

Down to ~$10 / 1M chars

$ per hour, streaming

Speech-to-text price vs the market

Other providersInworld (was)Inworld (now)

Source: provider pricing, June 2026; Inworld new pricing. On-demand $0.15/hr, down to $0.10 at scale.

Speech-to-text

Fast, accurate, among the lowest cost

Our speech-to-text ranks top-4 in quality and reads emotion, accent, and intent as the user speaks, so the app responds to a person. It handles long, continuous sessions that call-center transcribers can't, and it now costs less than every major streaming rate we benchmark.

Down to ~$0.10 / hour

What a typical 5% gateway markup costs per year. The Inworld Router: $0.

A gateway charges you more

5% gateway markup (per year)Inworld Router: $0

A typical gateway adds ~5% on credit purchases. The Inworld Router passes routed models through at no markup.

The LLM, through the Router

Hundreds of models, no markup

The LLM is a major line. Through the Router you reach 220+ models from every major lab and route to the best one for each job with a config change, not a migration. Routed third-party models carry no markup, where a typical gateway adds about 5%, which compounds into real money at scale. And the biggest wins come from decomposition: we helped one social app split one giant prompt into small, task-tuned models, cutting AI cost about 95% while engagement held.

No markup · up to ~95% lower

The same top open models, up to 50% below the public third-party rate

Up to half the cost for your LLMs

You payYou save

Inworld realtime inference serves the top open models up to 50% below what you would pay elsewhere, with better latency and reliability.

Realtime inference, a new product

Optimized open models, up to 50% below the public rate

Realtime inference is a new product. The same team that tuned our voice models now hosts the top open models consumer apps run in production and serves them up to 50% below the public rate, with better latency and reliability, accessed through the Router. Three of the ten highest-volume consumer apps we work with are already moving their LLM workloads onto it.

Up to 50% off, better latency

Variable token cost vs dedicated GPUs as volume grows (Illustrative)

When fixed compute wins

Per-token (variable)Dedicated GPU (fixed, from $5/GPU-hr)

Source: Illustrative; Inworld $5/GPU-hr, GCP H100 ~$11/GPU-hr.

Compute

Fixed GPUs when you reach real scale

At the largest scale, per-token billing is not always the right shape. You can move to dedicated GPUs from $5 per GPU-hour, less than half the on-demand rate of a hyperscaler, and transition at the crossover where fixed compute beats variable.

From $5 / GPU-hour

And it compounds

Your unit price gets cheaper the more you scale.

Today's prices are the ceiling, not the floor. As your total spend grows, the unit price of every layer falls. We own the layers where cost adds up, and scale makes us more efficient, not less. Your unit price drops by tier, all the way to enterprise. Shown here for Realtime TTS-2; speech-to-text and the LLM follow the same shape.

Realtime TTS-2, $ per 1M characters by monthly spend

Your unit price drops as you scale

Realtime TTS-2 unit priceElevenLabs API ~$100/1M, flat (off scale)

Inworld pricing, June 2026. The discounts arrive at modest spend: a $300/mo subscription covers about 20M characters at $15/1M, $1,500/mo covers about 120M at $12.50, and enterprise commitments step down from $10 toward $5 at the largest spend levels (illustrative). For comparison, the ElevenLabs API standard rate is ~$100 / 1M characters (standard tier, off this scale).

You saw the shape of this curve at the top of the post. Here it is with your spend on it: the total grows far slower than on a flat rate. You choose how to buy, trading flexibility for a lower, more predictable rate, up to about 90% off with fixed compute for the largest workloads.

Total monthly cost at increasing volume

What you actually spend

Inworld (tiered)Illustrative flat rate ($30 / 1M)

Illustrative. Inworld spend-tier pricing ($15 at $300/mo, $12.50 at $1,500/mo, $10 at enterprise scale, per 1M characters) vs a flat $30 / 1M rate.

Maximum discount by how you buy

More commitment, bigger discount

Max discount vs on-demand

Inworld pricing tiers: subscriptions reach 50% off list (TTS-2 $25 to $12.50 on Growth), enterprise commits reach 80% (TTS-1.5 Mini $25 to $5), fixed compute up to ~90% (illustrative, at full utilization vs per-token public rates).

Discounts on top of a high base never fixed the math. Discounts on top of a structurally lower base do, and fixed compute is the point where the line finally flattens. Pick the shape that matches your stage; the rate only moves one way as you grow.

Lower cost is the front door. The rest is built for consumer too.

Human feel

Voice profiling reads emotion and intent as the user speaks, and steerable, audio-native TTS-2 answers in kind, so a session adapts to the person.

Modularity

Reach any model you need through one API, and swap it with a config change instead of a migration.

Unified billing and management

One bill across every layer, volume discounts that apply across the stack, and per-team and per-user controls.

Experiment freely

A/B test models and variants on live traffic to find what improves both the experience and the margin.

Our commitment

Building the stack consumer AI deserves.

The price cut is the visible part. Underneath it is a stack we build and own end to end, where each layer rests on the one below. Read it from the surface down.

SurfaceFoundation

Layer 01 · APIs

Closest to the user

Simple, reliable APIs

Voice that feels humanRealtime latencyOne cross-lingual voiceThe control developers need

Layer 02 · Inference

Where the price cut comes from

Optimized inference and model management

Deep optimization on top open modelsPerformance, cost, reliabilityUniversal access via the RouterUnified billing and cost control

Layer 03 · Research

The bedrock

Research to serve the breadth of human experience

World-class research teamProprietary training dataYears of R&DBuilt for the diversity of human experience

Voice profiling reads the user, and steering makes responses expressive and empathetic. Realtime latency is the bar for engaging a consumer, and one cross-lingual voice reaches users across languages and markets, with the control developers need made cost-effective.

Deep optimization applied to the top open models consumer apps run in production means better performance, cost, and reliability, with universal access through the Router and unified billing across every model.

Where others narrow their focus, we want to enable the diversity of users and use cases that consumer represents, across voice, language models, and code. The breadth is the point, and it is the layer everything above is built to serve.

No single layer is the moat. The integrated whole is.

We are taking on the cost so consumer developers can win.

New pricing is live today for every developer. We build the best realtime models for voice and speech, optimize the inference that serves the LLMs you run, and keep lowering the cost, so a wider diversity of developers can bring AI to everyone.

Start building See pricing

Questions

Frequently asked.

How much are the new prices?

Realtime TTS-2 drops to about $10 per million characters at scale (TTS-1.5 Mini to about $5), speech-to-text falls to about $0.10 per hour, and LLMs run at cost through the Router. Every rate falls further as you scale, and most consumer apps land on enterprise tiers. Exact figures are at inworld.ai/pricing.

How does volume pricing work?

Pricing falls smoothly as your total spend grows, and it falls per layer, so the unit price of speech-to-text, the LLM, and text-to-speech each drops as you scale. You choose how to buy: pay-as-you-go, a subscription, a volume commit for a lower rate, or fixed compute for the largest workloads. Spend on one layer lowers the others, on one combined commit.

Which models can I route to, and is there a markup?

The Router reaches hundreds of models from OpenAI, Anthropic, Google, Mistral, Meta, DeepSeek and more. Routed third-party models carry no markup, and you can also route to Inworld's first-party realtime inference. It is OpenAI-SDK compatible, so you swap a model with a config change, not a migration.

What is realtime inference?

First-party hosted, optimized versions of the top open models consumer apps run in production, accessed through the Router. The same team that optimized our voice models tuned the serving, so it runs up to 50% below the public third-party rate while improving latency and reliability.

Is the cheaper price lower quality?

No. Realtime TTS-2 is top-ranked on independent speech benchmarks and our speech-to-text ranks top-4, and that holds while the price falls. The efficiency comes from owning the layer and optimizing the inference, not from cutting corners. It is structural where we own the layer; routed third-party models carry no markup, but their underlying economics are the provider's.

Can I move to fixed compute?

Yes. For high-scale workloads you can move off per-token billing to dedicated GPUs from $5 per GPU-hour, and transition when the economics make sense for you.

Do you price match?

Yes. For enterprise workloads we price match TTS models of comparable quality. Bring us your current bill and we will work through the numbers.

How do I start?

New pricing is live at inworld.ai/pricing for every developer, not gated to new accounts. Start free at inworld.ai, open a connection to the Realtime API, and the Router gives you unified billing, usage analytics, A/B testing, and cost control across every model.

Cost is the wall in front of consumer AI,and we are taking it down.