LLM Proxy

A drop-in proxy for every major LLM

Change one base URL and your existing OpenAI-compatible code routes to hundreds of models. Automatic failover, sticky per-user routing, full-fidelity logs, no rewrite.

Get an API key See the docs

Proxied call

Client

sdk openai-node v5.xbaseURL api.inworld.ai/v1

openai.chat.completions.create(model: 'openai/gpt-5.4')

Proxy

stream sse passthroughcache_hit falsetrace_id r_2k8j9…

200 · 820ms · $0.004 · logged

Works with

Router Realtime API

Everything a proxy should do. Nothing you didn't ask for.

One base URL, every major LLM, cache hits on repeated prompts, and per-request logs you can query.

One-line swap

Change the base URL. That's the whole integration.

Your OpenAI client keeps working. Streaming, tool use, structured output stay untouched. One string in the constructor points you at Inworld.

agent.ts

import OpenAI from 'openai';
−const openai = new OpenAI();
+const openai = new OpenAI({ baseURL: "https://api.inworld.ai/v1" });
// every existing OpenAI call now routes through Inworld
// streaming, tool use, structured output all unchanged

One-line swap

Change the base URL. That's the whole integration.

Your OpenAI client keeps working. Streaming, tool use, structured output stay untouched. One string in the constructor points you at Inworld.

agent.ts

import OpenAI from 'openai';
−const openai = new OpenAI();
+const openai = new OpenAI({ baseURL: "https://api.inworld.ai/v1" });
// every existing OpenAI call now routes through Inworld
// streaming, tool use, structured output all unchanged

Your SDK keeps working

No rewrite. No migration PR. Same SDK, new backend.

Whatever you're already using, openai-node, openai-python, LangChain, LlamaIndex, Vercel AI SDK, routes through on a baseURL change alone. Anthropic SDK hits /v1/messages.

Every SDK you already use

openai (Node)

identical

openai-python

identical

LangChain

baseURL only

LlamaIndex

baseURL only

Vercel AI SDK

identical

Anthropic SDK

via /v1/messages

Every SDK you already use

openai (Node)

identical

openai-python

identical

LangChain

baseURL only

LlamaIndex

baseURL only

Vercel AI SDK

identical

Anthropic SDK

via /v1/messages

Your SDK keeps working

No rewrite. No migration PR. Same SDK, new backend.

Whatever you're already using, openai-node, openai-python, LangChain, LlamaIndex, Vercel AI SDK, routes through on a baseURL change alone. Anthropic SDK hits /v1/messages.

Cache hits cost zero

Cut your bill ~73% on repeated prompts.

Exact-match cache is free by default. Semantic cache matches by meaning, so paraphrased prompts still hit. Cache returns in milliseconds at zero token cost.

Semantic cache, production traffic

~73%

Bill cut on repeated prompts

Exact match and semantic cache. Milliseconds, zero tokens.

Cache hits cost zero

Cut your bill ~73% on repeated prompts.

Exact-match cache is free by default. Semantic cache matches by meaning, so paraphrased prompts still hit. Cache returns in milliseconds at zero token cost.

Semantic cache, production traffic

~73%

Bill cut on repeated prompts

Exact match and semantic cache. Milliseconds, zero tokens.

No bill surprises

Every call, every token, every cent.

See where your spend went, per user and per model, before finance asks. Live tail in the portal, queryable by API, exportable wherever you already look.

24h request timeline

last 24h

00:00

12:00

23:59

Cache hit rate

48%

322 hits

355 paid

24h request timeline

last 24h

00:00

12:00

23:59

Cache hit rate

48%

322 hits

355 paid

No bill surprises

Every call, every token, every cent.

See where your spend went, per user and per model, before finance asks. Live tail in the portal, queryable by API, exportable wherever you already look.

Your parser keeps working

Your client code can't tell anything changed.

Streaming, tool calls, and structured output come back in the exact shape you already parse. Nothing in your app has to learn a new response format.

SSE streaming, untouched

for await (const chunk of stream) {
  process.stdout.write(chunk.choices[0].delta.content ?? '');
}
// Same SSE shape as OpenAI. Your parser doesn't care
// which provider is actually streaming the tokens.
// Tool use? Structured output? Passes through too.

Your parser keeps working

Your client code can't tell anything changed.

Streaming, tool calls, and structured output come back in the exact shape you already parse. Nothing in your app has to learn a new response format.

SSE streaming, untouched

for await (const chunk of stream) {
  process.stdout.write(chunk.choices[0].delta.content ?? '');
}
// Same SSE shape as OpenAI. Your parser doesn't care
// which provider is actually streaming the tokens.
// Tool use? Structured output? Passes through too.

Model swap in one string

GPT today. Claude tomorrow. Same request shape.

Change the model field and every call routes to a new provider. Same SDK, same messages array, same streaming. Promote by updating an env var.

Explore the Router

Swap models without redeploying

openai/gpt-5.4

anthropic/claude-sonnet-4-6

google/gemini-3.1-pro

meta/llama-4-maverick

mistral/medium-2508

groq/gpt-oss-120b

Change one string. Your response parser keeps working.

Swap models without redeploying

openai/gpt-5.4

anthropic/claude-sonnet-4-6

google/gemini-3.1-pro

meta/llama-4-maverick

mistral/medium-2508

groq/gpt-oss-120b

Change one string. Your response parser keeps working.

Model swap in one string

GPT today. Claude tomorrow. Same request shape.

Change the model field and every call routes to a new provider. Same SDK, same messages array, same streaming. Promote by updating an env var.

Explore the Router

The entire integration, in context

One line to swap. Your existing code keeps working. Hundreds of models become swappable strings.

import OpenAI from 'openai';

// Change the base URL. That's it.
const openai = new OpenAI({
  baseURL: 'https://api.inworld.ai/v1',
  apiKey: process.env.INWORLD_API_KEY,
});

// Your existing code keeps working
const response = await openai.chat.completions.create({
  model: 'openai/gpt-5.4',
  messages: [{ role: 'user', content: 'Hello' }],
  stream: true,
  tools: [/* unchanged */],
});

for await (const chunk of response) {
  process.stdout.write(chunk.choices[0]?.delta?.content ?? '');
}

// Swap providers: change one string
const claude = await openai.chat.completions.create({
  model: 'anthropic/claude-sonnet-4-6',
  messages: [{ role: 'user', content: 'Hello' }],
});

import OpenAI from 'openai';

// Change the base URL. That's it.
const openai = new OpenAI({
  baseURL: 'https://api.inworld.ai/v1',
  apiKey: process.env.INWORLD_API_KEY,
});

// Your existing code keeps working
const response = await openai.chat.completions.create({
  model: 'openai/gpt-5.4',
  messages: [{ role: 'user', content: 'Hello' }],
  stream: true,
  tools: [/* unchanged */],
});

for await (const chunk of response) {
  process.stdout.write(chunk.choices[0]?.delta?.content ?? '');
}

// Swap providers: change one string
const claude = await openai.chat.completions.create({
  model: 'anthropic/claude-sonnet-4-6',
  messages: [{ role: 'user', content: 'Hello' }],
});

Watch requests live in the portal.

Open the Router dashboard, hit a request, watch it appear with the model, provider, token count, cost, latency, and cache-hit status in real time. Every call, every cent.

Open the portal

FAQ

Yes. Your `openai` client, LangChain, LlamaIndex, and the Vercel AI SDK all work by changing the base URL, nothing else. Streaming, tool use, and structured output come back in the exact shape you already parse. Anthropic SDK users can hit `/v1/messages`.

Hundreds, across OpenAI, Anthropic, Google, Meta (Llama), Mistral, xAI, Groq, Fireworks, DeepSeek, Qwen and more. Prefix the model ID with the provider (`anthropic/claude-sonnet-4-6`) or pass `auto` to let Router pick by price, latency, or intelligence. See pricing.

Exact-match caching is always on and free. Semantic caching (match by meaning) is opt-in via the `cache` field. Cached responses return in milliseconds at zero token cost. For repeated prompts in production this routinely cuts bills by ~73%.

Every request is logged with model, provider, tokens in/out, cost, latency, cache hit, fallback events, and user attribution (via the `user` field). Live tail in the portal, queryable by API, exportable for downstream tooling. No third-party tracing service required.

Yes. SSE streaming passes through byte-for-byte in the OpenAI format, no transcoding, no special client. Tool-use chunks, function-call argument streaming, and structured-output deltas all come through unchanged.

Router is free during Research Preview, zero markup on pass-through model costs. You pay the underlying provider rate through a single Inworld bill.

Most teams ship the change in a single PR: update the baseURL constant, update the API key env var, deploy. No migration window, no dual-write, no provider outage risk. If you need to roll back, change one string.

Enterprise controls on Router include conditional routing, fallback orchestration, per-team budgets, and observability dashboards on top of the same proxy. Talk to the team if you need SOC 2, SSO, or custom retention.