Realtime Router

One endpoint. The right model for every user and context.

A single endpoint that routes every request to the right model for that user. Optimize for cost, latency, engagement, revenue, or any metric you care about. 100+ models, no markup on provider rates.

Get Started Explore Models Read the docs

Ready to build? Get your API key in under a minute.

Markup

Hundreds

of models

Line to integrate

Every routing use case, one endpoint

Route to different models based on user attributes like language, location, or subscription tier. Each user gets the model that fits them best.

curl 'https://api.inworld.ai/router/v1/routers' \
  -H "Authorization: Basic $INWORLD_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "displayName": "User-Aware",
    "routes": [
      {
        "condition": {
          "cel_expression": "language == \"es\""
        },
        "route": {
          "variants": [
            { "variant": { "modelId": "openai/gpt-5.2" }, "weight": 100 }
          ]
        }
      },
      {
        "condition": {
          "cel_expression": "plan == \"free\""
        },
        "route": {
          "variants": [
            { "variant": { "modelId": "anthropic/claude-haiku-4-5" }, "weight": 100 }
          ]
        }
      }
    ],
    "defaultRoute": {
      "variants": [
        { "variant": { "modelId": "anthropic/claude-sonnet-4-6" }, "weight": 100 }
      ]
    }
  }'

curl 'https://api.inworld.ai/router/v1/routers' \
  -H "Authorization: Basic $INWORLD_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "displayName": "Context-Aware",
    "routes": [
      {
        "condition": {
          "cel_expression": "emotion == \"frustrated\""
        },
        "route": {
          "variants": [
            { "variant": { "modelId": "anthropic/claude-sonnet-4-6" }, "weight": 100 }
          ]
        }
      },
      {
        "condition": {
          "cel_expression": "messages.last().content.size() < 50"
        },
        "route": {
          "variants": [
            { "variant": { "modelId": "anthropic/claude-haiku-4-5" }, "weight": 100 }
          ]
        }
      }
    ],
    "defaultRoute": {
      "variants": [
        { "variant": { "modelId": "openai/gpt-5.2" }, "weight": 100 }
      ]
    }
  }'

curl 'https://api.inworld.ai/router/v1/routers' \
  -H "Content-Type: application/json" \
  -H "Authorization: Basic $INWORLD_API_KEY" \
  -d '{
    "name": "ab-test",
    "displayName": "A/B Test",
    "defaultRoute": {
        "routeId": "default",
        "variants": [
            { "variant": { "variantId": "anthropic", "modelId": "anthropic/claude-opus-4-6" }, "weight": 50 },
            { "variant": { "variantId": "openai", "modelId": "openai/gpt-5.2" }, "weight": 50 }
        ]
    }
}'

curl 'https://api.inworld.ai/router/v1/routers' \
  -H "Content-Type: application/json" \
  -H "Authorization: Basic $INWORLD_API_KEY" \
  -d '{
    "name": "cost-optimizer",
    "routes": [
        {
            "route": { "routeId": "simple", "variants": [{ "variant": { "variantId": "gpt-5-nano", "modelId": "openai/gpt-5-nano" }, "weight": 100 }] },
            "condition": { "cel_expression": "messages.last().content.size() < 50" }
        }
    ],
    "defaultRoute": {
        "routeId": "complex",
        "variants": [
            { "variant": { "variantId": "gpt-5.2", "modelId": "openai/gpt-5.2" }, "weight": 100 }
        ]
    }
}'

curl 'https://api.inworld.ai/router/v1/routers' \
  -H "Authorization: Basic $INWORLD_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "displayName": "User-Aware",
    "routes": [
      {
        "condition": {
          "cel_expression": "language == \"es\""
        },
        "route": {
          "variants": [
            { "variant": { "modelId": "openai/gpt-5.2" }, "weight": 100 }
          ]
        }
      },
      {
        "condition": {
          "cel_expression": "plan == \"free\""
        },
        "route": {
          "variants": [
            { "variant": { "modelId": "anthropic/claude-haiku-4-5" }, "weight": 100 }
          ]
        }
      }
    ],
    "defaultRoute": {
      "variants": [
        { "variant": { "modelId": "anthropic/claude-sonnet-4-6" }, "weight": 100 }
      ]
    }
  }'

curl 'https://api.inworld.ai/router/v1/routers' \
  -H "Authorization: Basic $INWORLD_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "displayName": "Context-Aware",
    "routes": [
      {
        "condition": {
          "cel_expression": "emotion == \"frustrated\""
        },
        "route": {
          "variants": [
            { "variant": { "modelId": "anthropic/claude-sonnet-4-6" }, "weight": 100 }
          ]
        }
      },
      {
        "condition": {
          "cel_expression": "messages.last().content.size() < 50"
        },
        "route": {
          "variants": [
            { "variant": { "modelId": "anthropic/claude-haiku-4-5" }, "weight": 100 }
          ]
        }
      }
    ],
    "defaultRoute": {
      "variants": [
        { "variant": { "modelId": "openai/gpt-5.2" }, "weight": 100 }
      ]
    }
  }'

curl 'https://api.inworld.ai/router/v1/routers' \
  -H "Content-Type: application/json" \
  -H "Authorization: Basic $INWORLD_API_KEY" \
  -d '{
    "name": "ab-test",
    "displayName": "A/B Test",
    "defaultRoute": {
        "routeId": "default",
        "variants": [
            { "variant": { "variantId": "anthropic", "modelId": "anthropic/claude-opus-4-6" }, "weight": 50 },
            { "variant": { "variantId": "openai", "modelId": "openai/gpt-5.2" }, "weight": 50 }
        ]
    }
}'

curl 'https://api.inworld.ai/router/v1/routers' \
  -H "Content-Type: application/json" \
  -H "Authorization: Basic $INWORLD_API_KEY" \
  -d '{
    "name": "cost-optimizer",
    "routes": [
        {
            "route": { "routeId": "simple", "variants": [{ "variant": { "variantId": "gpt-5-nano", "modelId": "openai/gpt-5-nano" }, "weight": 100 }] },
            "condition": { "cel_expression": "messages.last().content.size() < 50" }
        }
    ],
    "defaultRoute": {
        "routeId": "complex",
        "variants": [
            { "variant": { "variantId": "gpt-5.2", "modelId": "openai/gpt-5.2" }, "weight": 100 }
        ]
    }
}'

Every routing use case, one endpoint

Route to different models based on user attributes like language, location, or subscription tier. Each user gets the model that fits them best.

Pay provider rates. Nothing else.

Most routers take 5% on every call. Realtime Router charges zero markup during Research Preview. At $150K/month in LLM spend, that is $90K/year back in your pocket. After preview, bring your own keys for continued zero-markup access.

View pricing

Realtime Router

markup on every provider call

Other routers

+5% markup

Pay provider rates. Nothing else.

View pricing

Realtime Router

markup on every provider call

Other routers

+5% markup

Route by who the user is, not just what they asked.

Pass language, country, tier, or emotion as metadata. CEL expressions evaluate conditions in real time and pick the right model per user. Update rules without redeploying.

CEL expressions on any metadata: language, tier, intent, session depth
Sticky user assignment for consistent experiences across sessions
Change routing logic without deploys or feature flags

Read routing docs

request + metadata

language == "es"

GPT-5.2

plan == "free"

Claude Haiku

emotion == "frustrated"

Claude Sonnet

default → Claude Sonnet 4.6

request + metadata

language == "es"

GPT-5.2

plan == "free"

Claude Haiku

emotion == "frustrated"

Claude Sonnet

default → Claude Sonnet 4.6

Route by who the user is, not just what they asked.

Pass language, country, tier, or emotion as metadata. CEL expressions evaluate conditions in real time and pick the right model per user. Update rules without redeploying.

CEL expressions on any metadata: language, tier, intent, session depth
Sticky user assignment for consistent experiences across sessions
Change routing logic without deploys or feature flags

Read routing docs

Change one line. Keep your entire codebase.

Swap your base URL to api.inworld.ai, update your API key, done. Full OpenAI and Anthropic SDK compatibility. No request or response changes.

OpenAI SDK: change base_url to https://api.inworld.ai/v1
Anthropic SDK: change base_url to https://api.inworld.ai/anthropic/v1

Migration guide

client.py

from openai import OpenAI
client = OpenAI(
base_url="https://api.openai.com/v1"
base_url="https://api.inworld.ai/v1"
)

Change one line. Keep your entire codebase.

Swap your base URL to api.inworld.ai, update your API key, done. Full OpenAI and Anthropic SDK compatibility. No request or response changes.

OpenAI SDK: change base_url to https://api.inworld.ai/v1
Anthropic SDK: change base_url to https://api.inworld.ai/anthropic/v1

Migration guide

client.py

from openai import OpenAI
client = OpenAI(
base_url="https://api.openai.com/v1"
base_url="https://api.inworld.ai/v1"
)

A/B test models on real users. No deploys.

Split traffic between Claude and GPT with sticky user assignment. Measure retention, satisfaction, or conversion per model. Ramp the winner to 100% via API or dashboard.

A/B testing docs

Live split

Claude Opus 4.6

50%

GPT-5.2

50%

TTFT

320ms

280ms

CSAT

4.2

4.0

Cost

$0.003

$0.004

*Illustrative data

Live split

Claude Opus 4.6

50%

GPT-5.2

50%

TTFT

320ms

280ms

CSAT

4.2

4.0

Cost

$0.003

$0.004

*Illustrative data

A/B test models on real users. No deploys.

Split traffic between Claude and GPT with sticky user assignment. Measure retention, satisfaction, or conversion per model. Ramp the winner to 100% via API or dashboard.

A/B testing docs

Text and speech from one API call.

Add an audio parameter to any chat completions request. Get streamed text and speech in a single response. Route to the best LLM, then pipe straight into Realtime TTS. No second integration. No added latency.

Works with any model through any router configuration
#1 ranked TTS on Artificial Analysis

Explore Realtime TTS

Your app

Router

TTS

Single request body

{
"model": "inworld/my-router",
"messages": [...],
"audio": {
"voice": "Sarah",
"model": "inworld-tts-1.5-max"
}
}

Text and speech from one API call.

Works with any model through any router configuration
#1 ranked TTS on Artificial Analysis

Explore Realtime TTS

Your app

Router

TTS

Single request body

{
"model": "inworld/my-router",
"messages": [...],
"audio": {
"voice": "Sarah",
"model": "inworld-tts-1.5-max"
}
}

Your coding agent. Every model. No markup.

If a model goes down or rate-limits you, traffic reroutes instantly to the next best option. Your workflow never stops. Works with Cursor, Claude Code, Codex CLI, Aider, Continue, and Windsurf.

One env var to set up, no code changes required
Slash commands to switch models without leaving your editor

Explore vibe coding

slash-command routing

/code

Claude Opus 4.6

/review

DeepSeek V3.2

/docs

Gemini 3 Flash

default → MiniMax M2.5

slash-command routing

/code

Claude Opus 4.6

/review

DeepSeek V3.2

/docs

Gemini 3 Flash

default → MiniMax M2.5

Your coding agent. Every model. No markup.

If a model goes down or rate-limits you, traffic reroutes instantly to the next best option. Your workflow never stops. Works with Cursor, Claude Code, Codex CLI, Aider, Continue, and Windsurf.

One env var to set up, no code changes required
Slash commands to switch models without leaving your editor

Explore vibe coding

From voice agents to vibe coding. One API, every pattern.

Voice Agents

Route by caller language and intent. Pipe responses into TTS. Failover keeps calls alive.

Production-ready out of the box

Routing is just the start. Everything else your production stack needs is built in.

Per-request observability

Every request logs the model selected, attempt chain, TTFT, token cost, and routing reason.

Mixpanel + BigQuery export

Stream request-level data to your analytics stack. Build dashboards on model performance and cost.

Prompt caching

Automatic caching for OpenAI, DeepSeek, Gemini. Explicit cache_control for Anthropic and Google.

Web search and grounding

Tool-based search via Exa and Google, or native provider grounding. Responses include citations.

Auto model selection

Set model to "auto" and sort by price, latency, throughput, intelligence, math, or coding benchmarks.

Full SSE streaming

Streaming for every model. Less than 5ms added latency to first token.

Bring your own keys

Use your own provider API keys for continued zero-markup access.

OpenAI + Anthropic SDK compatible

Drop-in replacement for both SDKs. Change your base URL and keep everything else.

Router vs. the alternatives

Capability	Direct APIs	Inworld Router	OpenRouter
Models available	1 per provider	Hundreds, across all providers	200+ across providers
Pricing markup	None	None (Research Preview)	~5% commission
Context-aware routing (CEL)	Build it yourself	Built in	Not available
A/B testing with sticky users	Build it yourself	Built in	Not available
Automatic failover	Build it yourself	TTFT-based	Retry on error
Integrated TTS	Separate integration	Single API call	Not available
Per-request observability	Build it yourself	Built in + export	Basic logging
SDK compatibility	N/A (native)	OpenAI + Anthropic	OpenAI only
Caching	Provider-dependent	Implicit + explicit	Provider-dependent
Web search / grounding	Provider-dependent	Tool-based + native	Not available

FAQ

Yes. Migration guides are available for OpenRouter and Anthropic-based setups. The core change is updating your base_url and API key, while your existing request structure stays the same.

Router provides access to hundreds of models from leading providers, such as OpenAI, Anthropic, Google, and many more. You can see the full model list here.

While Router is in Research Preview, you pay provider rates directly, with no markup or margin added. Rates for all models are available here.

Realtime Router itself doesn't impose additional rate limits on top of providers. Provider-level rate limits are handled automatically by retrying the next model in your fallback chain.

Most gateways give you a unified API and basic fallback. Realtime Router offers more control and lets you run real experiments: conditional routing, dynamic tiering, traffic splitting by percentage, sticky user assignment, with results pushed to your analytics platform of choice.