Get started
Realtime Router

One endpoint. The right model for every user and context.

A single endpoint that routes every request to the right model for that user. Optimize for cost, latency, engagement, revenue, or any metric you care about. 100+ models, no markup on provider rates.
0%
Markup
Hundreds
of models
1
Line to integrate

Every routing use case, one endpoint

Route to different models based on user attributes like language, location, or subscription tier. Each user gets the model that fits them best.

curl 'https://api.inworld.ai/router/v1/routers' \ -H "Authorization: Basic $INWORLD_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "displayName": "User-Aware", "routes": [ { "condition": { "cel_expression": "language == \"es\"" }, "route": { "variants": [ { "variant": { "modelId": "openai/gpt-5.2" }, "weight": 100 } ] } }, { "condition": { "cel_expression": "plan == \"free\"" }, "route": { "variants": [ { "variant": { "modelId": "anthropic/claude-haiku-4-5" }, "weight": 100 } ] } } ], "defaultRoute": { "variants": [ { "variant": { "modelId": "anthropic/claude-sonnet-4-6" }, "weight": 100 } ] } }'

Pay provider rates. Nothing else.

Most routers take 5% on every call. Realtime Router charges zero markup during Research Preview. At $150K/month in LLM spend, that is $90K/year back in your pocket. After preview, bring your own keys for continued zero-markup access.

View pricing
Realtime Router
0%
markup on every provider call
Other routers
+5% markup

Route by who the user is, not just what they asked.

Pass language, country, tier, or emotion as metadata. CEL expressions evaluate conditions in real time and pick the right model per user. Update rules without redeploying.

  • CEL expressions on any metadata: language, tier, intent, session depth
  • Sticky user assignment for consistent experiences across sessions
  • Change routing logic without deploys or feature flags
Read routing docs
request + metadata
language == "es"
GPT-5.2
plan == "free"
Claude Haiku
emotion == "frustrated"
Claude Sonnet
default → Claude Sonnet 4.6

Change one line. Keep your entire codebase.

Swap your base URL to api.inworld.ai, update your API key, done. Full OpenAI and Anthropic SDK compatibility. No request or response changes.

  • OpenAI SDK: change base_url to https://api.inworld.ai/v1
  • Anthropic SDK: change base_url to https://api.inworld.ai/anthropic/v1
Migration guide
client.py
from openai import OpenAI
client = OpenAI(
base_url="https://api.openai.com/v1"
base_url="https://api.inworld.ai/v1"
)

A/B test models on real users. No deploys.

Split traffic between Claude and GPT with sticky user assignment. Measure retention, satisfaction, or conversion per model. Ramp the winner to 100% via API or dashboard.

A/B testing docs
Live split
Claude Opus 4.6
50%
GPT-5.2
50%
TTFT
320ms
/
280ms
CSAT
4.2
/
4.0
Cost
$0.003
/
$0.004
*Illustrative data

Text and speech from one API call.

Add an audio parameter to any chat completions request. Get streamed text and speech in a single response. Route to the best LLM, then pipe straight into Realtime TTS. No second integration. No added latency.

  • Works with any model through any router configuration
  • #1 ranked TTS on Artificial Analysis
Explore Realtime TTS
Your app
Router
TTS
Single request body
{
"model": "inworld/my-router",
"messages": [...],
"audio": {
"voice": "Sarah",
"model": "inworld-tts-1.5-max"
}
}

Your coding agent. Every model. No markup.

If a model goes down or rate-limits you, traffic reroutes instantly to the next best option. Your workflow never stops. Works with Cursor, Claude Code, Codex CLI, Aider, Continue, and Windsurf.

  • One env var to set up, no code changes required
  • Slash commands to switch models without leaving your editor
Explore vibe coding
slash-command routing
/code
Claude Opus 4.6
/review
DeepSeek V3.2
/docs
Gemini 3 Flash
default → MiniMax M2.5

From voice agents to vibe coding. One API, every pattern.

Production-ready out of the box

Routing is just the start. Everything else your production stack needs is built in.

Per-request observability

Every request logs the model selected, attempt chain, TTFT, token cost, and routing reason.

Mixpanel + BigQuery export

Stream request-level data to your analytics stack. Build dashboards on model performance and cost.

Prompt caching

Automatic caching for OpenAI, DeepSeek, Gemini. Explicit cache_control for Anthropic and Google.

Web search and grounding

Tool-based search via Exa and Google, or native provider grounding. Responses include citations.

Auto model selection

Set model to "auto" and sort by price, latency, throughput, intelligence, math, or coding benchmarks.

Full SSE streaming

Streaming for every model. Less than 5ms added latency to first token.

Bring your own keys

Use your own provider API keys for continued zero-markup access.

OpenAI + Anthropic SDK compatible

Drop-in replacement for both SDKs. Change your base URL and keep everything else.

Router vs. the alternatives

Capability
Direct APIs
Inworld Router
OpenRouter
Models available
1 per provider
Hundreds, across all providers
200+ across providers
Pricing markup
None
None (Research Preview)
~5% commission
Context-aware routing (CEL)
Build it yourself
Built in
Not available
A/B testing with sticky users
Build it yourself
Built in
Not available
Automatic failover
Build it yourself
TTFT-based
Retry on error
Integrated TTS
Separate integration
Single API call
Not available
Per-request observability
Build it yourself
Built in + export
Basic logging
SDK compatibility
N/A (native)
OpenAI + Anthropic
OpenAI only
Caching
Provider-dependent
Implicit + explicit
Provider-dependent
Web search / grounding
Provider-dependent
Tool-based + native
Not available

FAQ

Yes. Migration guides are available for OpenRouter and Anthropic-based setups. The core change is updating your base_url and API key, while your existing request structure stays the same.
Router provides access to hundreds of models from leading providers, such as OpenAI, Anthropic, Google, and many more. You can see the full model list here.
While Router is in Research Preview, you pay provider rates directly, with no markup or margin added. Rates for all models are available here.
Realtime Router itself doesn't impose additional rate limits on top of providers. Provider-level rate limits are handled automatically by retrying the next model in your fallback chain.
Most gateways give you a unified API and basic fallback. Realtime Router offers more control and lets you run real experiments: conditional routing, dynamic tiering, traffic splitting by percentage, sticky user assignment, with results pushed to your analytics platform of choice.

Start building with Realtime Router

One API key. Every model. No markup on provider rates.
Copyright © 2021-2026 Inworld AI
LLM Router: One API for 100+ AI Models