Get started
Realtime Router

One endpoint. The right model for every user and context.

One endpoint, 220+ models, zero markup. A typical gateway adds about 5%; the Router passes provider rates through at cost. Route every request to the right model for that user, optimized for cost, latency, engagement, revenue, or any metric you care about.
0%
Markup
220+
Models
1
Line to integrate

Every routing use case, one endpoint

Route to different models based on user attributes like language, location, or subscription tier. Each user gets the model that fits them best.

curl 'https://api.inworld.ai/router/v1/routers' \ -H "Authorization: Basic $INWORLD_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "displayName": "User-Aware", "routes": [ { "condition": { "cel_expression": "language == \"es\"" }, "route": { "variants": [ { "variant": { "modelId": "openai/gpt-5.2" }, "weight": 100 } ] } }, { "condition": { "cel_expression": "plan == \"free\"" }, "route": { "variants": [ { "variant": { "modelId": "anthropic/claude-haiku-4-5" }, "weight": 100 } ] } } ], "defaultRoute": { "variants": [ { "variant": { "modelId": "anthropic/claude-sonnet-4-6" }, "weight": 100 } ] } }'

Pay provider rates. Or less.

A typical gateway adds about 5% to every call. Realtime Router passes provider rates through at cost, with zero markup on routed third-party models. At $150K per month in LLM spend, that is $90K per year back. Route to Inworld's first-party realtime inference and top open models run up to 50% below the public third-party rate.

View pricing
Realtime Router
0%
markup on every provider call
Typical gateway
+5% markup

Route by who the user is, not just what they asked.

Pass language, country, tier, or emotion as metadata. CEL expressions evaluate conditions in real time and pick the right model per user. Update rules without redeploying.

  • CEL expressions on any metadata: language, tier, intent, session depth
  • Sticky user assignment for consistent experiences across sessions
  • Change routing logic without deploys or feature flags
Read routing docs
request + metadata
language == "es"
GPT-5.2
plan == "free"
Claude Haiku
emotion == "frustrated"
Claude Sonnet
default → Claude Sonnet 4.6

Change one line. Keep your entire codebase.

Swap your base URL to api.inworld.ai, update your API key, done. Full OpenAI and Anthropic SDK compatibility. No request or response changes.

  • OpenAI SDK: change base_url to https://api.inworld.ai/v1
  • Anthropic SDK: change base_url to https://api.inworld.ai/anthropic/v1
Migration guide
client.py
from openai import OpenAI
client = OpenAI(
base_url="https://api.openai.com/v1"
base_url="https://api.inworld.ai/v1"
)

A/B test models on real users. No deploys.

Split traffic between Claude and GPT with sticky user assignment. Measure retention, satisfaction, or conversion per model. Ramp the winner to 100% via API or dashboard.

A/B testing docs
Live split
Claude Opus 4.6
50%
GPT-5.2
50%
TTFT
320ms
/
280ms
CSAT
4.2
/
4.0
Cost
$0.003
/
$0.004
*Illustrative data

Text and speech from one API call.

Add an audio parameter to any chat completions request. Get streamed text and speech in a single response. Route to the best LLM, then pipe straight into Realtime TTS. No second integration. No added latency.

  • Works with any model through any router configuration
  • #1 realtime TTS on Artificial Analysis
Explore Realtime TTS
Your app
Router
TTS
Single request body
{
"model": "inworld/my-router",
"messages": [...],
"audio": {
"voice": "Sarah",
"model": "inworld-tts-2"
}
}

Your coding agent. Every model. No markup.

If a model goes down or rate-limits you, traffic reroutes instantly to the next best option. Your workflow never stops. Works with Cursor, Claude Code, Codex CLI, Aider, Continue, and Windsurf.

  • One env var to set up, no code changes required
  • Slash commands to switch models without leaving your editor
Explore vibe coding
slash-command routing
/code
Claude Opus 4.6
/review
DeepSeek V3.2
/docs
Gemini 3 Flash
default → MiniMax M2.5

From voice agents to vibe coding. One API, every pattern.

Production-ready out of the box

Routing is just the start. Everything else your production stack needs is built in.

Per-request observability

Every request logs the model selected, attempt chain, TTFT, token cost, and routing reason.

Mixpanel + BigQuery export

Stream request-level data to your analytics stack. Build dashboards on model performance and cost.

Prompt caching

Automatic caching for OpenAI, DeepSeek, Gemini. Explicit cache_control for Anthropic and Google.

Web search and grounding

Tool-based search via Exa and Google, or native provider grounding. Responses include citations.

Auto model selection

Set model to "auto" and sort by price, latency, throughput, intelligence, math, or coding benchmarks.

Full SSE streaming

Streaming for every model. Less than 5ms added latency to first token.

Bring your own keys

Bring your own provider keys if you prefer. Routed third-party models carry no markup either way.

OpenAI + Anthropic SDK compatible

Drop-in replacement for both SDKs. Change your base URL and keep everything else.

Router vs. the alternatives

Capability
Direct APIs
Inworld Router
OpenRouter
Models available
1 per provider
220+, across all providers
Hundreds across providers
Pricing markup
None
None. Provider rates at cost
5.5% fee on credit purchases
Context-aware routing (CEL)
Build it yourself
Built in
Not available
A/B testing with sticky users
Build it yourself
Built in
Not available
Automatic failover
Build it yourself
TTFT-based
Retry on error
Integrated TTS
Separate integration
Single API call
Not available
Per-request observability
Build it yourself
Built in + export
Basic logging
SDK compatibility
N/A (native)
OpenAI + Anthropic
OpenAI only
Caching
Provider-dependent
Implicit + explicit
Provider-dependent
Web search / grounding
Provider-dependent
Tool-based + native
Available

Stop paying gateway markup

A typical gateway taxes you about 5% on your LLM bill. The Inworld Router charges $0: transparent pass-through of provider rates, no hidden fees. And the markup is only the start. The biggest cuts come from routing intelligence: splitting one large prompt into small, task-tuned models for each request, not from discounts.
What a typical 5% gateway markup costs per year. The Inworld Router: $0.

A gateway charges you more

5% gateway markup (per year)Inworld Router: $0
A typical gateway adds ~5% on credit purchases. The Inworld Router passes routed models through at no markup.

“We scaled from prototype to 1 million users in 19 days with over 20x cost reduction.”

Fai Nur, CEO, Status · Self-reported

FAQ

Yes. Migration guides are available for OpenRouter and Anthropic-based setups. The core change is updating your base_url and API key, while your existing request structure stays the same.
Router provides access to hundreds of models from leading providers, such as OpenAI, Anthropic, Google, and many more. You can see the full model list here.
You pay provider rates directly: routed third-party models carry no markup, while a typical gateway adds about 5%. You can also route to Inworld's first-party realtime inference, where top open models run up to 50% below the public third-party rate. Rates for all models are available here.
Realtime Router itself doesn't impose additional rate limits on top of providers. Provider-level rate limits are handled automatically by retrying the next model in your fallback chain.
Most gateways give you a unified API and basic fallback. Realtime Router offers more control and lets you run real experiments: conditional routing, dynamic tiering, traffic splitting by percentage, sticky user assignment, with results pushed to your analytics platform of choice.

Start building with Realtime Router

One API key. Every model. No markup on provider rates.
Copyright © 2021-2026 Inworld AI
LLM Router: One API for 220+ AI Models