Realtime Router

One endpoint. The right model for every user and context.

One endpoint, 220+ models, zero markup. A typical gateway adds about 5%; the Router passes provider rates through at cost. Route every request to the right model for that user, optimized for cost, latency, engagement, revenue, or any metric you care about.

Get Started Explore Models Read the docs

Ready to build? Get your API key in under a minute.

Markup

220+

Models

Line to integrate

Every routing use case, one endpoint

Route to different models based on user attributes like language, location, or subscription tier. Each user gets the model that fits them best.

curl 'https://api.inworld.ai/router/v1/routers' \
  -H "Authorization: Basic $INWORLD_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "displayName": "User-Aware",
    "routes": [
      {
        "condition": {
          "cel_expression": "language == \"es\""
        },
        "route": {
          "variants": [
            { "variant": { "modelId": "openai/gpt-5.2" }, "weight": 100 }
          ]
        }
      },
      {
        "condition": {
          "cel_expression": "plan == \"free\""
        },
        "route": {
          "variants": [
            { "variant": { "modelId": "anthropic/claude-haiku-4-5" }, "weight": 100 }
          ]
        }
      }
    ],
    "defaultRoute": {
      "variants": [
        { "variant": { "modelId": "anthropic/claude-sonnet-4-6" }, "weight": 100 }
      ]
    }
  }'

curl 'https://api.inworld.ai/router/v1/routers' \
  -H "Authorization: Basic $INWORLD_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "displayName": "Context-Aware",
    "routes": [
      {
        "condition": {
          "cel_expression": "emotion == \"frustrated\""
        },
        "route": {
          "variants": [
            { "variant": { "modelId": "anthropic/claude-sonnet-4-6" }, "weight": 100 }
          ]
        }
      },
      {
        "condition": {
          "cel_expression": "messages.last().content.size() < 50"
        },
        "route": {
          "variants": [
            { "variant": { "modelId": "anthropic/claude-haiku-4-5" }, "weight": 100 }
          ]
        }
      }
    ],
    "defaultRoute": {
      "variants": [
        { "variant": { "modelId": "openai/gpt-5.2" }, "weight": 100 }
      ]
    }
  }'

curl 'https://api.inworld.ai/router/v1/routers' \
  -H "Content-Type: application/json" \
  -H "Authorization: Basic $INWORLD_API_KEY" \
  -d '{
    "name": "ab-test",
    "displayName": "A/B Test",
    "defaultRoute": {
        "routeId": "default",
        "variants": [
            { "variant": { "variantId": "anthropic", "modelId": "anthropic/claude-opus-4-6" }, "weight": 50 },
            { "variant": { "variantId": "openai", "modelId": "openai/gpt-5.2" }, "weight": 50 }
        ]
    }
}'

curl 'https://api.inworld.ai/router/v1/routers' \
  -H "Content-Type: application/json" \
  -H "Authorization: Basic $INWORLD_API_KEY" \
  -d '{
    "name": "cost-optimizer",
    "routes": [
        {
            "route": { "routeId": "simple", "variants": [{ "variant": { "variantId": "gpt-5-nano", "modelId": "openai/gpt-5-nano" }, "weight": 100 }] },
            "condition": { "cel_expression": "messages.last().content.size() < 50" }
        }
    ],
    "defaultRoute": {
        "routeId": "complex",
        "variants": [
            { "variant": { "variantId": "gpt-5.2", "modelId": "openai/gpt-5.2" }, "weight": 100 }
        ]
    }
}'

curl 'https://api.inworld.ai/router/v1/routers' \
  -H "Authorization: Basic $INWORLD_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "displayName": "User-Aware",
    "routes": [
      {
        "condition": {
          "cel_expression": "language == \"es\""
        },
        "route": {
          "variants": [
            { "variant": { "modelId": "openai/gpt-5.2" }, "weight": 100 }
          ]
        }
      },
      {
        "condition": {
          "cel_expression": "plan == \"free\""
        },
        "route": {
          "variants": [
            { "variant": { "modelId": "anthropic/claude-haiku-4-5" }, "weight": 100 }
          ]
        }
      }
    ],
    "defaultRoute": {
      "variants": [
        { "variant": { "modelId": "anthropic/claude-sonnet-4-6" }, "weight": 100 }
      ]
    }
  }'

curl 'https://api.inworld.ai/router/v1/routers' \
  -H "Authorization: Basic $INWORLD_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "displayName": "Context-Aware",
    "routes": [
      {
        "condition": {
          "cel_expression": "emotion == \"frustrated\""
        },
        "route": {
          "variants": [
            { "variant": { "modelId": "anthropic/claude-sonnet-4-6" }, "weight": 100 }
          ]
        }
      },
      {
        "condition": {
          "cel_expression": "messages.last().content.size() < 50"
        },
        "route": {
          "variants": [
            { "variant": { "modelId": "anthropic/claude-haiku-4-5" }, "weight": 100 }
          ]
        }
      }
    ],
    "defaultRoute": {
      "variants": [
        { "variant": { "modelId": "openai/gpt-5.2" }, "weight": 100 }
      ]
    }
  }'

curl 'https://api.inworld.ai/router/v1/routers' \
  -H "Content-Type: application/json" \
  -H "Authorization: Basic $INWORLD_API_KEY" \
  -d '{
    "name": "ab-test",
    "displayName": "A/B Test",
    "defaultRoute": {
        "routeId": "default",
        "variants": [
            { "variant": { "variantId": "anthropic", "modelId": "anthropic/claude-opus-4-6" }, "weight": 50 },
            { "variant": { "variantId": "openai", "modelId": "openai/gpt-5.2" }, "weight": 50 }
        ]
    }
}'

curl 'https://api.inworld.ai/router/v1/routers' \
  -H "Content-Type: application/json" \
  -H "Authorization: Basic $INWORLD_API_KEY" \
  -d '{
    "name": "cost-optimizer",
    "routes": [
        {
            "route": { "routeId": "simple", "variants": [{ "variant": { "variantId": "gpt-5-nano", "modelId": "openai/gpt-5-nano" }, "weight": 100 }] },
            "condition": { "cel_expression": "messages.last().content.size() < 50" }
        }
    ],
    "defaultRoute": {
        "routeId": "complex",
        "variants": [
            { "variant": { "variantId": "gpt-5.2", "modelId": "openai/gpt-5.2" }, "weight": 100 }
        ]
    }
}'

Every routing use case, one endpoint

Route to different models based on user attributes like language, location, or subscription tier. Each user gets the model that fits them best.

Pay provider rates. Or less.

A typical gateway adds about 5% to every call. Realtime Router passes provider rates through at cost, with zero markup on routed third-party models. At $150K per month in LLM spend, that is $90K per year back. Route to Inworld's first-party realtime inference and top open models run up to 50% below the public third-party rate.

View pricing

Realtime Router

markup on every provider call

Typical gateway

+5% markup

Pay provider rates. Or less.

View pricing

Realtime Router

markup on every provider call

Typical gateway

+5% markup

Route by who the user is, not just what they asked.

Pass language, country, tier, or emotion as metadata. CEL expressions evaluate conditions in real time and pick the right model per user. Update rules without redeploying.

CEL expressions on any metadata: language, tier, intent, session depth
Sticky user assignment for consistent experiences across sessions
Change routing logic without deploys or feature flags

Read routing docs

request + metadata

language == "es"

GPT-5.2

plan == "free"

Claude Haiku

emotion == "frustrated"

Claude Sonnet

default → Claude Sonnet 4.6

request + metadata

language == "es"

GPT-5.2

plan == "free"

Claude Haiku

emotion == "frustrated"

Claude Sonnet

default → Claude Sonnet 4.6

Route by who the user is, not just what they asked.

Pass language, country, tier, or emotion as metadata. CEL expressions evaluate conditions in real time and pick the right model per user. Update rules without redeploying.

CEL expressions on any metadata: language, tier, intent, session depth
Sticky user assignment for consistent experiences across sessions
Change routing logic without deploys or feature flags

Read routing docs

Change one line. Keep your entire codebase.

Swap your base URL to api.inworld.ai, update your API key, done. Full OpenAI and Anthropic SDK compatibility. No request or response changes.

OpenAI SDK: change base_url to https://api.inworld.ai/v1
Anthropic SDK: change base_url to https://api.inworld.ai/anthropic/v1

Migration guide

client.py

from openai import OpenAI
client = OpenAI(
base_url="https://api.openai.com/v1"
base_url="https://api.inworld.ai/v1"
)

Change one line. Keep your entire codebase.

Swap your base URL to api.inworld.ai, update your API key, done. Full OpenAI and Anthropic SDK compatibility. No request or response changes.

OpenAI SDK: change base_url to https://api.inworld.ai/v1
Anthropic SDK: change base_url to https://api.inworld.ai/anthropic/v1

Migration guide

client.py

from openai import OpenAI
client = OpenAI(
base_url="https://api.openai.com/v1"
base_url="https://api.inworld.ai/v1"
)

A/B test models on real users. No deploys.

Split traffic between Claude and GPT with sticky user assignment. Measure retention, satisfaction, or conversion per model. Ramp the winner to 100% via API or dashboard.

A/B testing docs

Live split

Claude Opus 4.6

50%

GPT-5.2

50%

TTFT

320ms

280ms

CSAT

4.2

4.0

Cost

$0.003

$0.004

*Illustrative data

Live split

Claude Opus 4.6

50%

GPT-5.2

50%

TTFT

320ms

280ms

CSAT

4.2

4.0

Cost

$0.003

$0.004

*Illustrative data

A/B test models on real users. No deploys.

Split traffic between Claude and GPT with sticky user assignment. Measure retention, satisfaction, or conversion per model. Ramp the winner to 100% via API or dashboard.

A/B testing docs

Text and speech from one API call.

Add an audio parameter to any chat completions request. Get streamed text and speech in a single response. Route to the best LLM, then pipe straight into Realtime TTS. No second integration. No added latency.

Works with any model through any router configuration
#1 realtime TTS on Artificial Analysis

Explore Realtime TTS

Your app

Router

TTS

Single request body

{
"model": "inworld/my-router",
"messages": [...],
"audio": {
"voice": "Sarah",
"model": "inworld-tts-2"
}
}

Text and speech from one API call.

Works with any model through any router configuration
#1 realtime TTS on Artificial Analysis

Explore Realtime TTS

Your app

Router

TTS

Single request body

{
"model": "inworld/my-router",
"messages": [...],
"audio": {
"voice": "Sarah",
"model": "inworld-tts-2"
}
}

Your coding agent. Every model. No markup.

If a model goes down or rate-limits you, traffic reroutes instantly to the next best option. Your workflow never stops. Works with Cursor, Claude Code, Codex CLI, Aider, Continue, and Windsurf.

One env var to set up, no code changes required
Slash commands to switch models without leaving your editor

Explore vibe coding

slash-command routing

/code

Claude Opus 4.6

/review

DeepSeek V3.2

/docs

Gemini 3 Flash

default → MiniMax M2.5

slash-command routing

/code

Claude Opus 4.6

/review

DeepSeek V3.2

/docs

Gemini 3 Flash

default → MiniMax M2.5

Your coding agent. Every model. No markup.

If a model goes down or rate-limits you, traffic reroutes instantly to the next best option. Your workflow never stops. Works with Cursor, Claude Code, Codex CLI, Aider, Continue, and Windsurf.

One env var to set up, no code changes required
Slash commands to switch models without leaving your editor

Explore vibe coding

From voice agents to vibe coding. One API, every pattern.

Voice Agents

Route by caller language and intent. Pipe responses into TTS. Failover keeps calls alive.

Production-ready out of the box

Routing is just the start. Everything else your production stack needs is built in.

Per-request observability

Every request logs the model selected, attempt chain, TTFT, token cost, and routing reason.

Mixpanel + BigQuery export

Stream request-level data to your analytics stack. Build dashboards on model performance and cost.

Prompt caching

Automatic caching for OpenAI, DeepSeek, Gemini. Explicit cache_control for Anthropic and Google.

Web search and grounding

Tool-based search via Exa and Google, or native provider grounding. Responses include citations.

Auto model selection

Set model to "auto" and sort by price, latency, throughput, intelligence, math, or coding benchmarks.

Full SSE streaming

Streaming for every model. Less than 5ms added latency to first token.

Bring your own keys

Bring your own provider keys if you prefer. Routed third-party models carry no markup either way.

OpenAI + Anthropic SDK compatible

Drop-in replacement for both SDKs. Change your base URL and keep everything else.

Router vs. the alternatives

Capability	Direct APIs	Inworld Router	OpenRouter
Models available	1 per provider	220+, across all providers	Hundreds across providers
Pricing markup	None	None. Provider rates at cost	5.5% fee on credit purchases
Context-aware routing (CEL)	Build it yourself	Built in	Not available
A/B testing with sticky users	Build it yourself	Built in	Not available
Automatic failover	Build it yourself	TTFT-based	Retry on error
Integrated TTS	Separate integration	Single API call	Not available
Per-request observability	Build it yourself	Built in + export	Basic logging
SDK compatibility	N/A (native)	OpenAI + Anthropic	OpenAI only
Caching	Provider-dependent	Implicit + explicit	Provider-dependent
Web search / grounding	Provider-dependent	Tool-based + native	Available

Stop paying gateway markup

A typical gateway taxes you about 5% on your LLM bill. The Inworld Router charges $0: transparent pass-through of provider rates, no hidden fees. And the markup is only the start. The biggest cuts come from routing intelligence: splitting one large prompt into small, task-tuned models for each request, not from discounts.

What a typical 5% gateway markup costs per year. The Inworld Router: $0.

A gateway charges you more

5% gateway markup (per year)Inworld Router: $0

A typical gateway adds ~5% on credit purchases. The Inworld Router passes routed models through at no markup.

“We scaled from prototype to 1 million users in 19 days with over 20x cost reduction.”

Fai Nur, CEO, Status · Self-reported

FAQ

Yes. Migration guides are available for OpenRouter and Anthropic-based setups. The core change is updating your base_url and API key, while your existing request structure stays the same.

Router provides access to hundreds of models from leading providers, such as OpenAI, Anthropic, Google, and many more. You can see the full model list here.

You pay provider rates directly: routed third-party models carry no markup, while a typical gateway adds about 5%. You can also route to Inworld's first-party realtime inference, where top open models run up to 50% below the public third-party rate. Rates for all models are available here.

Realtime Router itself doesn't impose additional rate limits on top of providers. Provider-level rate limits are handled automatically by retrying the next model in your fallback chain.

Most gateways give you a unified API and basic fallback. Realtime Router offers more control and lets you run real experiments: conditional routing, dynamic tiering, traffic splitting by percentage, sticky user assignment, with results pushed to your analytics platform of choice.