Get started
LLM Proxy

A drop-in proxy for every major LLM

Change one base URL and your existing OpenAI-compatible code routes to hundreds of models. Automatic failover, sticky per-user routing, full-fidelity logs, no rewrite.
Proxied call
Client
sdk openai-node v5.xbaseURL api.inworld.ai/v1

openai.chat.completions.create(model: 'openai/gpt-5.4')

Proxy
stream sse passthroughcache_hit falsetrace_id r_2k8j9…

200 · 820ms · $0.004 · logged

Everything a proxy should do. Nothing you didn't ask for.

One base URL, every major LLM, cache hits on repeated prompts, and per-request logs you can query.
One-line swap

Change the base URL. That's the whole integration.

Your OpenAI client keeps working. Streaming, tool use, structured output stay untouched. One string in the constructor points you at Inworld.
agent.ts
import OpenAI from 'openai';
const openai = new OpenAI();
+const openai = new OpenAI({ baseURL: "https://api.inworld.ai/v1" });
// every existing OpenAI call now routes through Inworld
// streaming, tool use, structured output all unchanged
Your SDK keeps working

No rewrite. No migration PR. Same SDK, new backend.

Whatever you're already using, openai-node, openai-python, LangChain, LlamaIndex, Vercel AI SDK, routes through on a baseURL change alone. Anthropic SDK hits /v1/messages.
Every SDK you already use
openai (Node)
identical
openai-python
identical
LangChain
baseURL only
LlamaIndex
baseURL only
Vercel AI SDK
identical
Anthropic SDK
via /v1/messages
Cache hits cost zero

Cut your bill ~73% on repeated prompts.

Exact-match cache is free by default. Semantic cache matches by meaning, so paraphrased prompts still hit. Cache returns in milliseconds at zero token cost.
Semantic cache, production traffic
~73%
Bill cut on repeated prompts
Exact match and semantic cache. Milliseconds, zero tokens.
No bill surprises

Every call, every token, every cent.

See where your spend went, per user and per model, before finance asks. Live tail in the portal, queryable by API, exportable wherever you already look.
24h request timeline
last 24h
00:00
12:00
23:59
Cache hit rate
48%
322 hits
355 paid
Your parser keeps working

Your client code can't tell anything changed.

Streaming, tool calls, and structured output come back in the exact shape you already parse. Nothing in your app has to learn a new response format.
SSE streaming, untouched
for await (const chunk of stream) {
process.stdout.write(chunk.choices[0].delta.content ?? '');
}
// Same SSE shape as OpenAI. Your parser doesn't care
// which provider is actually streaming the tokens.
// Tool use? Structured output? Passes through too.
Model swap in one string

GPT today. Claude tomorrow. Same request shape.

Change the model field and every call routes to a new provider. Same SDK, same messages array, same streaming. Promote by updating an env var.
Explore the Router
Swap models without redeploying
openai/gpt-5.4
anthropic/claude-sonnet-4-6
google/gemini-3.1-pro
meta/llama-4-maverick
mistral/medium-2508
groq/gpt-oss-120b
Change one string. Your response parser keeps working.

The entire integration, in context

One line to swap. Your existing code keeps working. Hundreds of models become swappable strings.
import OpenAI from 'openai'; // Change the base URL. That's it. const openai = new OpenAI({ baseURL: 'https://api.inworld.ai/v1', apiKey: process.env.INWORLD_API_KEY, }); // Your existing code keeps working const response = await openai.chat.completions.create({ model: 'openai/gpt-5.4', messages: [{ role: 'user', content: 'Hello' }], stream: true, tools: [/* unchanged */], }); for await (const chunk of response) { process.stdout.write(chunk.choices[0]?.delta?.content ?? ''); } // Swap providers: change one string const claude = await openai.chat.completions.create({ model: 'anthropic/claude-sonnet-4-6', messages: [{ role: 'user', content: 'Hello' }], });

Watch requests live in the portal.

Open the Router dashboard, hit a request, watch it appear with the model, provider, token count, cost, latency, and cache-hit status in real time. Every call, every cent.
Open the portal

FAQ

Yes. Your `openai` client, LangChain, LlamaIndex, and the Vercel AI SDK all work by changing the base URL, nothing else. Streaming, tool use, and structured output come back in the exact shape you already parse. Anthropic SDK users can hit `/v1/messages`.
Hundreds, across OpenAI, Anthropic, Google, Meta (Llama), Mistral, xAI, Groq, Fireworks, DeepSeek, Qwen and more. Prefix the model ID with the provider (`anthropic/claude-sonnet-4-6`) or pass `auto` to let Router pick by price, latency, or intelligence. See pricing.
Exact-match caching is always on and free. Semantic caching (match by meaning) is opt-in via the `cache` field. Cached responses return in milliseconds at zero token cost. For repeated prompts in production this routinely cuts bills by ~73%.
Every request is logged with model, provider, tokens in/out, cost, latency, cache hit, fallback events, and user attribution (via the `user` field). Live tail in the portal, queryable by API, exportable for downstream tooling. No third-party tracing service required.
Yes. SSE streaming passes through byte-for-byte in the OpenAI format, no transcoding, no special client. Tool-use chunks, function-call argument streaming, and structured-output deltas all come through unchanged.
Router is free during Research Preview, zero markup on pass-through model costs. You pay the underlying provider rate through a single Inworld bill.
Most teams ship the change in a single PR: update the baseURL constant, update the API key env var, deploy. No migration window, no dual-write, no provider outage risk. If you need to roll back, change one string.
Enterprise controls on Router include conditional routing, fallback orchestration, per-team budgets, and observability dashboards on top of the same proxy. Talk to the team if you need SOC 2, SSO, or custom retention.

Change one line. Ship everything else unchanged.

Drop-in OpenAI compatibility, caching, fallbacks, logs, hundreds of models. Free during Research Preview.
Copyright © 2021-2026 Inworld AI
LLM Proxy: Drop-In OpenAI Compatibility, One Line Change | Inworld AI