Get started
Cost Optimization

Cut your LLM bill up to 95% without losing quality

Route easy prompts to cheap models and reserve the best models for the requests that matter. One endpoint, conditional rules, and semantic caching. Customers cut production AI spend by 40 to 95%.
Cost routing
Request
CEL rule complexity == 'low'

Support ticket · low complexity

Routed
vs premium tier -91% cost

groq/gpt-oss-120b · $0.001 · 180ms

Powered by
Router

Cut spend without cutting quality.

Conditional routing, semantic caching, pass-through pricing. Four levers that move the LLM bill 40-70% smaller.
40-70% typical savings

Cut the bill without cutting the quality.

Customer baselines land at 40-70% spend reduction with no quality loss. The savings come from picking the right model per request, not downgrading every call.
Router · production baseline
40-70%
Spend cut with no quality loss
Published customer baselines on live LLM traffic.
Route by difficulty

Easy prompts hit cheap models. Escalations hit the best ones.

Conditional routing classifies the request per turn. A FAQ lands on Groq for a penny. An escalation routes to Claude Sonnet. Same user, right-sized model.
Route easy to cheap, hard to best
Simple FAQ
groq/gpt-oss-120b
$0.001
Support
openai/gpt-5.4
$0.004
Escalation
anthropic/claude-sonnet-4-6
$0.012
Semantic cache hits cost zero

Paraphrased prompts hit the same cached answer.

Exact-match caching is free by default. Semantic caching matches by meaning, so "what's your return policy" and "how do returns work" share one response.
Semantic cache · zero tokens on a hit
73%
Hit rate
24h
+412M tokens saved
Paraphrased prompts match by meaning and skip the model entirely.
One knob per request

Say what you care about and the model shows up.

Cheapest answer for batch work. Fastest for voice. Smartest for hard reasoning. You tell Router what matters per request and it picks the model that hits it.
Sort by price, latency, or intelligence
Quality
Cost
gpt-oss-120b
gpt-5.4-mini
gpt-5.4
claude-sonnet-4-6
One field per request. Router picks the model that hits your constraint.
Pay the provider, not the middleman

Same rate as going direct, minus the direct integration.

You pay the provider rate, nothing extra. Routing, caching, and observability ride on top of the bill you'd already be paying during Research Preview.
Your bill · receipt
openai/gpt-5.4
1.2k tokens
$0.004
anthropic/claude-sonnet-4-6
890 tokens
$0.012
groq/gpt-oss-120b
1.1k tokens
$0.001
Router markup
$0.00
You pay the provider rate. Router adds nothing.
Attribute every cent

Know which feature, which tier, which user is driving spend.

Every request logs user, tier, feature tag, tokens, and cost. Finance can attribute blended cost per active user. No blind billing, no surprises.
Spend by tier
last 7 days
Enterprise
41K calls
$1,242
Premium
28K calls
$380
Free
12K calls
$0.08
Every request logs user, tier, feature, tokens, and cost.

FAQ

Published customer baselines show 40-70% savings on production LLM spend after enabling conditional routing + semantic caching. Specific savings depend on workload (cache-friendly traffic saves more, heavy-reasoning traffic saves less). Router is free during Research Preview, so you only pay underlying model costs.
No. Conditional routing sends easy prompts to cheap models and escalations to the best ones, so the easy requests keep quality because they never needed the premium model, and the hard ones keep quality because they get the model that handles them. The failure mode is downgrading everything; we explicitly don't do that.
CEL (Common Expression Language) is the rule language Router uses for conditional routing. You write expressions like `complexity == 'low'` or `user.tier == 'premium'` and Router evaluates them per request to pick the model. Human-readable, no custom DSL to learn.
Exact-match caching is free and always on. Semantic caching (opt-in per request) matches by meaning, so two differently-worded requests for the same answer hit the same cached response. Production traffic typically caches ~73% of repeated intents.
Every request logs model, provider, tokens, cost, cache status, and user attribution (via the `user` field on the request). Query the logs API or view live tail in the portal. Export to your FinOps or billing system.
Zero during Research Preview. Pass-through pricing means you pay the underlying provider rate through Inworld. Most LLM gateways add 5-15%, Router adds 0%.
Yes. Router is OpenAI SDK compatible: change the base URL, keep your code. See the Router docs for the drop-in pattern.
Router pricing and model rates live on the pricing page. For workload-specific estimates, contact the team with your current monthly spend and we'll model the expected reduction.

40-70% less LLM spend. Same answers.

Route by difficulty. Cache by meaning. Sort by price. Zero gateway markup during Research Preview.
Copyright © 2021-2026 Inworld AI
LLM Cost Optimization: Cut 40-70% With Smart Routing | Inworld AI