AI Gateway

Connect to every major LLM through one API

Hundreds of models from OpenAI, Anthropic, Google, Meta, and Mistral through a single OpenAI-compatible endpoint. Sticky per-user routing, automatic failover, no markup on pass-through.

Get an API key See the docs

Gateway request

Request

model openai/gpt-5.4fallback anthropic/claude-sonnet-4-6

POST /v1/chat/completions · OpenAI SDK

Response

actual_model openai/gpt-5.4cache_hit falsetotal_ms 820

200 · 1,180 tokens · $0.004

Router Realtime API

Everything a production team needs from a gateway.

Drop-in OpenAI compatibility, cross-provider failover, per-request cost visibility. One endpoint across every major LLM.

One API, every provider

Every major LLM, one endpoint away.

OpenAI, Anthropic, Google, Meta, Mistral, Groq, DeepSeek, Qwen. Hundreds of models on a single OpenAI-compatible endpoint. Your app never knows which provider answered.

One endpoint, every provider

Your code · OpenAI SDK

/v1/chat/completions

OpenAI

Anthropic

Google

Every major LLM, one endpoint away.

OpenAI, Anthropic, Google, Meta, Mistral, Groq, DeepSeek, Qwen. Hundreds of models on a single OpenAI-compatible endpoint. Your app never knows which provider answered.

One endpoint, every provider

Your code · OpenAI SDK

/v1/chat/completions

OpenAI

Anthropic

Google

Change one line, ship in minutes.

Swap the base URL on your existing OpenAI client and point the key at Inworld. Every call routes through with the same streaming, the same tool use, the same SDK.

Change base URL. Add models. Ship.

Before

new OpenAI({  apiKey: OPENAI_KEY,}).chat.completions.create({  model: 'gpt-5.4',})

After

new OpenAI({  baseURL: 'https://api.inworld.ai/v1',  apiKey: INWORLD_KEY,}).chat.completions.create({  model: 'openai/gpt-5.4',  models: ['anthropic/claude-sonnet-4-6'],})

Same SDK. One line for the endpoint, one field for fallbacks.

Change base URL. Add models. Ship.

Before

new OpenAI({  apiKey: OPENAI_KEY,}).chat.completions.create({  model: 'gpt-5.4',})

After

new OpenAI({  baseURL: 'https://api.inworld.ai/v1',  apiKey: INWORLD_KEY,}).chat.completions.create({  model: 'openai/gpt-5.4',  models: ['anthropic/claude-sonnet-4-6'],})

Same SDK. One line for the endpoint, one field for fallbacks.

Drop-in OpenAI swap

Change one line, ship in minutes.

Swap the base URL on your existing OpenAI client and point the key at Inworld. Every call routes through with the same streaming, the same tool use, the same SDK.

Auto-failover

Stay up when providers don't.

Declare an ordered list of fallback models on the request. When the primary returns 429 or 5xx, Router retries on the next provider. Streaming survives the swap.

Never down when OpenAI is

429

openai/gpt-5.4

primary · rate limited

200

anthropic/claude-sonnet-4-6

backup · responded

One ordered list. Cross-provider. Streaming survives the swap.

Auto-failover

Stay up when providers don't.

Declare an ordered list of fallback models on the request. When the primary returns 429 or 5xx, Router retries on the next provider. Streaming survives the swap.

Never down when OpenAI is

429

openai/gpt-5.4

primary · rate limited

200

anthropic/claude-sonnet-4-6

backup · responded

One ordered list. Cross-provider. Streaming survives the swap.

Cost controls

Route smart. Spend less. Ship the same product.

Sort by price, latency, or intelligence per request. Conditional routing escalates when complexity demands it. Implicit and semantic caching means repeated prompts cost zero.

See cost optimization

Route smart · spend less

40-70%

Spend cut by intelligent routing

Cheap for easy prompts. Best-in-class when it matters.

Route smart · spend less

40-70%

Spend cut by intelligent routing

Cheap for easy prompts. Best-in-class when it matters.

Cost controls

Route smart. Spend less. Ship the same product.

Sort by price, latency, or intelligence per request. Conditional routing escalates when complexity demands it. Implicit and semantic caching means repeated prompts cost zero.

See cost optimization

See where the money goes

Every token, every cent, every millisecond.

Know which model, which user, and which hour is spending your budget. Spot runaway prompts before the bill does. No separate tracing service to bolt on.

Spend heatmap · models × hours

last 24h

openai/gpt-5.4

anthropic/claude-sonnet-4-6

google/gemini-3.1-pro

groq/gpt-oss-120b

low

high

See where the money goes

Every token, every cent, every millisecond.

Know which model, which user, and which hour is spending your budget. Spot runaway prompts before the bill does. No separate tracing service to bolt on.

Spend heatmap · models × hours

last 24h

openai/gpt-5.4

anthropic/claude-sonnet-4-6

google/gemini-3.1-pro

groq/gpt-oss-120b

low

high

The reasoning layer for voice

Works with

Realtime API

TTS

STT

When your app adds voice, you're already on the pipeline.

The same gateway that reasons for your chat app reasons for your voice agents. One key, one bill, one place to watch every call across every surface.

See voice agents

One gateway, four workloads

Voice agents

Realtime API + Router

Vibe coding

Router + GPT / Claude

Roleplay

Router + chat agents

Async LLM

Router + cache + batch

Same endpoint that powers our voice pipeline powers your reasoning.

One gateway, four workloads

Voice agents

Realtime API + Router

Vibe coding

Router + GPT / Claude

Roleplay

Router + chat agents

Async LLM

Router + cache + batch

Same endpoint that powers our voice pipeline powers your reasoning.

The reasoning layer for voice

Works with

Realtime API

TTS

STT

When your app adds voice, you're already on the pipeline.

The same gateway that reasons for your chat app reasons for your voice agents. One key, one bill, one place to watch every call across every surface.

See voice agents

Get started in three lines

The base URL swap is literally all the integration. Every SDK you already use keeps working, every endpoint you already call keeps returning.

import OpenAI from 'openai';

const client = new OpenAI({
  baseURL: 'https://api.inworld.ai/v1',
  apiKey: process.env.INWORLD_API_KEY,
});

// Use any model through the same endpoint
const completion = await client.chat.completions.create({
  model: 'openai/gpt-5.4',
  messages: [{ role: 'user', content: 'Hello' }],
  stream: true,
});

for await (const chunk of completion) {
  process.stdout.write(chunk.choices[0]?.delta?.content ?? '');
}

// Need a fallback? Add one field.
const safe = await client.chat.completions.create({
  model: 'openai/gpt-5.4',
  messages: [{ role: 'user', content: 'Hello' }],
  // @ts-expect-error extra_body
  extra_body: {
    models: ['anthropic/claude-sonnet-4-6', 'google/gemini-3.1-pro'],
  },
});

import OpenAI from 'openai';

const client = new OpenAI({
  baseURL: 'https://api.inworld.ai/v1',
  apiKey: process.env.INWORLD_API_KEY,
});

// Use any model through the same endpoint
const completion = await client.chat.completions.create({
  model: 'openai/gpt-5.4',
  messages: [{ role: 'user', content: 'Hello' }],
  stream: true,
});

for await (const chunk of completion) {
  process.stdout.write(chunk.choices[0]?.delta?.content ?? '');
}

// Need a fallback? Add one field.
const safe = await client.chat.completions.create({
  model: 'openai/gpt-5.4',
  messages: [{ role: 'user', content: 'Hello' }],
  // @ts-expect-error extra_body
  extra_body: {
    models: ['anthropic/claude-sonnet-4-6', 'google/gemini-3.1-pro'],
  },
});

Configure in the portal.

Set up conditional routes with CEL, define fallback chains, enable semantic caching, and watch live traffic across providers from one dashboard. Everything the API does, configurable without writing code.

Open the portal

FAQ

Yes. The gateway speaks the OpenAI Chat Completions protocol, same endpoint shape, same request body, same streaming SSE format, same tool use and structured output. Change the base URL on your OpenAI client, swap the API key, and every call routes through Inworld with no code changes.

Hundreds of models across OpenAI, Anthropic, Google, Meta (Llama), Mistral, xAI, Groq, Fireworks, DeepSeek, Qwen and more. Specify by provider-qualified ID (`openai/gpt-5.4`), raw ID (`gpt-oss-120b`), or `auto` to let the gateway pick by price, latency, intelligence, or any other sort metric you configure. See pricing for rates.

Add a `models` array to the request with an ordered list of backup models. When the primary returns a 429, 5xx, or hits a capacity cap, the gateway retries on the next model. The request shape is identical across providers, so your response parsing code never changes.

Yes. Set `stream: true` on any request and the gateway passes through server-sent events in the OpenAI format. Streaming continues through a fallback swap, your client code never sees the transition.

Router is free during Research Preview, zero markup on pass-through model costs. You pay the underlying provider rate through a single Inworld bill. See the cost optimization guide for routing strategies that cut underlying costs 40-70%.

Implicit caching on identical prompts is free. Semantic caching (matching by meaning, not exact text) is configurable per request. Cached responses return in milliseconds at zero token cost.

Every request is logged with model, provider, tokens, cost, latency, cache hit, fallback events, and user attribution. Access via the portal dashboard or the logs API. No separate tracing service required.

Yes. The same endpoint underpins the Inworld Realtime API so when you add voice you're already on the pipeline. Router sits as the reasoning layer in every Inworld voice agent.