Get started
AI Gateway

Connect to every major LLM through one API

Hundreds of models from OpenAI, Anthropic, Google, Meta, and Mistral through a single OpenAI-compatible endpoint. Sticky per-user routing, automatic failover, no markup on pass-through.
Gateway request
Request
model openai/gpt-5.4fallback anthropic/claude-sonnet-4-6

POST /v1/chat/completions · OpenAI SDK

Response
actual_model openai/gpt-5.4cache_hit falsetotal_ms 820

200 · 1,180 tokens · $0.004

Everything a production team needs from a gateway.

Drop-in OpenAI compatibility, cross-provider failover, per-request cost visibility. One endpoint across every major LLM.
One API, every provider

Every major LLM, one endpoint away.

OpenAI, Anthropic, Google, Meta, Mistral, Groq, DeepSeek, Qwen. Hundreds of models on a single OpenAI-compatible endpoint. Your app never knows which provider answered.
One endpoint, every provider
Your code · OpenAI SDK
/v1/chat/completions
OpenAI
Anthropic
Google
Meta
Mistral
xAI
Groq
Fireworks
DeepSeek
Qwen
Drop-in OpenAI swap

Change one line, ship in minutes.

Swap the base URL on your existing OpenAI client and point the key at Inworld. Every call routes through with the same streaming, the same tool use, the same SDK.
Change base URL. Add models. Ship.
Before
new OpenAI({  apiKey: OPENAI_KEY,}).chat.completions.create({  model: 'gpt-5.4',})
After
new OpenAI({  baseURL: 'https://api.inworld.ai/v1',  apiKey: INWORLD_KEY,}).chat.completions.create({  model: 'openai/gpt-5.4',  models: ['anthropic/claude-sonnet-4-6'],})
Same SDK. One line for the endpoint, one field for fallbacks.
Auto-failover

Stay up when providers don't.

Declare an ordered list of fallback models on the request. When the primary returns 429 or 5xx, Router retries on the next provider. Streaming survives the swap.
Never down when OpenAI is
429
openai/gpt-5.4
primary · rate limited
200
anthropic/claude-sonnet-4-6
backup · responded
One ordered list. Cross-provider. Streaming survives the swap.
Cost controls

Route smart. Spend less. Ship the same product.

Sort by price, latency, or intelligence per request. Conditional routing escalates when complexity demands it. Implicit and semantic caching means repeated prompts cost zero.
See cost optimization
Route smart · spend less
40-70%
Spend cut by intelligent routing
Cheap for easy prompts. Best-in-class when it matters.
See where the money goes

Every token, every cent, every millisecond.

Know which model, which user, and which hour is spending your budget. Spot runaway prompts before the bill does. No separate tracing service to bolt on.
Spend heatmap · models × hours
last 24h
00
04
08
12
16
20
openai/gpt-5.4
anthropic/claude-sonnet-4-6
google/gemini-3.1-pro
groq/gpt-oss-120b
low
high
The reasoning layer for voice

When your app adds voice, you're already on the pipeline.

The same gateway that reasons for your chat app reasons for your voice agents. One key, one bill, one place to watch every call across every surface.
See voice agents
One gateway, four workloads
Voice agents
Realtime API + Router
Vibe coding
Router + GPT / Claude
Roleplay
Router + chat agents
Async LLM
Router + cache + batch
Same endpoint that powers our voice pipeline powers your reasoning.

Get started in three lines

The base URL swap is literally all the integration. Every SDK you already use keeps working, every endpoint you already call keeps returning.
import OpenAI from 'openai'; const client = new OpenAI({ baseURL: 'https://api.inworld.ai/v1', apiKey: process.env.INWORLD_API_KEY, }); // Use any model through the same endpoint const completion = await client.chat.completions.create({ model: 'openai/gpt-5.4', messages: [{ role: 'user', content: 'Hello' }], stream: true, }); for await (const chunk of completion) { process.stdout.write(chunk.choices[0]?.delta?.content ?? ''); } // Need a fallback? Add one field. const safe = await client.chat.completions.create({ model: 'openai/gpt-5.4', messages: [{ role: 'user', content: 'Hello' }], // @ts-expect-error extra_body extra_body: { models: ['anthropic/claude-sonnet-4-6', 'google/gemini-3.1-pro'], }, });

Configure in the portal.

Set up conditional routes with CEL, define fallback chains, enable semantic caching, and watch live traffic across providers from one dashboard. Everything the API does, configurable without writing code.
Open the portal

FAQ

Yes. The gateway speaks the OpenAI Chat Completions protocol, same endpoint shape, same request body, same streaming SSE format, same tool use and structured output. Change the base URL on your OpenAI client, swap the API key, and every call routes through Inworld with no code changes.
Hundreds of models across OpenAI, Anthropic, Google, Meta (Llama), Mistral, xAI, Groq, Fireworks, DeepSeek, Qwen and more. Specify by provider-qualified ID (`openai/gpt-5.4`), raw ID (`gpt-oss-120b`), or `auto` to let the gateway pick by price, latency, intelligence, or any other sort metric you configure. See pricing for rates.
Add a `models` array to the request with an ordered list of backup models. When the primary returns a 429, 5xx, or hits a capacity cap, the gateway retries on the next model. The request shape is identical across providers, so your response parsing code never changes.
Yes. Set `stream: true` on any request and the gateway passes through server-sent events in the OpenAI format. Streaming continues through a fallback swap, your client code never sees the transition.
Router is free during Research Preview, zero markup on pass-through model costs. You pay the underlying provider rate through a single Inworld bill. See the cost optimization guide for routing strategies that cut underlying costs 40-70%.
Implicit caching on identical prompts is free. Semantic caching (matching by meaning, not exact text) is configurable per request. Cached responses return in milliseconds at zero token cost.
Every request is logged with model, provider, tokens, cost, latency, cache hit, fallback events, and user attribution. Access via the portal dashboard or the logs API. No separate tracing service required.
Yes. The same endpoint underpins the Inworld Realtime API so when you add voice you're already on the pipeline. Router sits as the reasoning layer in every Inworld voice agent.

One endpoint. Hundreds of LLMs. Zero lock-in.

Drop-in OpenAI compatibility. Auto-failover. Cost controls. Free during Research Preview.
Copyright © 2021-2026 Inworld AI
AI Gateway: One Endpoint, Hundreds of LLMs, Zero Lock-In | Inworld AI