Get started
Fallbacks

Stay online when any AI provider goes down

List your backup models in one ordered array and Router swaps to the next one the moment a provider refuses. One OpenAI-compatible endpoint, every provider, no custom retry logic.
Failover event
Primary
auto-retry 1x with backoff

openai/gpt-5.4 · 429 · retry-after 60

Fallback
stream continuity preserved

anthropic/claude-sonnet-4-6 · 200 · streaming

Powered by
Router

Stay up when any provider goes down.

Ordered cross-provider chains, streaming continuity, transparent retries. One JSON field turns a single point of failure into four nines of uptime.
Cross-provider uptime
Works with
Router

Four nines when no single provider hits three.

Chain three providers in an ordered array and compounded uptime lands at 99.99%. One field, no custom retry logic.
Three providers, one chain
OpenAI
99.5%
×
Anthropic
99.5%
×
Google
99.5%
99.99%
combined uptime
No retry code to write
Works with
Router

Declare the chain. Router handles the swap.

List the models you want tried, in order, and Router moves on when one refuses. Your app never writes backoff logic again.
Failover chain · time to success
openai/gpt-5.4
429
anthropic/claude-sonnet-4-6
503
google/gemini-3.1-pro
200
0ms
250ms
500ms
750ms
1000ms
Cross-provider by default
Works with
Router

Same-vendor fallbacks fail in the same outage.

A gpt-5.4 to gpt-4.1 chain doesn't help when OpenAI itself is down. Mix OpenAI, Anthropic, Google in one list.
Cross-provider fallback matrix
OpenAI
Anthropic
Google
OpenAI
Anthropic
Google
Green edges stay up when the red edge (same-vendor pair) fails.
Streaming survives the swap
Works with
Router

Mid-stream failure, mid-stream recovery.

If a primary fails after streaming starts, Router swaps to the next model and resumes. Your SSE client sees one continuous response.
Streaming survives the swap
// stream starts on openai/gpt-5.4
chunk 0: 'The core'
chunk 1: ' idea is'
// 503 at chunk 2 — Router swaps to Claude
chunk 2: ' to treat'
chunk 3: ' async as'
// your client never saw the transition
Keep the SDK you already have
Works with
Router

Your OpenAI SDK, plus one field. That's the whole integration.

No new client to install, no gateway library to learn. Fallbacks ride on the request you're already sending, so your dependencies stay where they are.
No gateway library to install
one JSON field
client.chat.completions.create({
model: 'openai/gpt-5.4',
messages: [...],
// the one field that gives you fallbacks
models: ['anthropic/claude-sonnet-4-6', 'google/gemini-3.1-pro'],
});
// no LiteLLM. no custom retry wrapper. just OpenAI SDK + one field.
Post-mortem-ready receipts
Works with
Router

Know exactly what happened when it mattered.

Every swap logs what tripped it, how many retries it took, and which model finally answered. When something breaks at 2am, the trail is already there.
Failover log · last 3 events
live tail
02:14:06
429 rate_limit
200
openai/gpt-5.4 · retries: 1 · anthropic/claude-sonnet-4-6
02:14:09
503 capacity
200
openai/gpt-5.4 · retries: 0 · google/gemini-3.1-pro
02:14:12
timeout 8s
200
openai/gpt-5.4 · retries: 0 · anthropic/claude-sonnet-4-6
Every swap logs trigger, retry count, and which model finally answered.

FAQ

Add a `models` array to your Chat Completions request with an ordered list of backup models. Example: `{ model: 'openai/gpt-5.4', models: ['anthropic/claude-sonnet-4-6', 'google/gemini-3.1-pro'] }`. If the primary returns a 429, 503, or a provider 5xx, Router retries on the next model in the list.
Yes. Router treats every provider the same, OpenAI, Anthropic, Google, Meta, Mistral, Groq, Fireworks. Your fallback chain can mix providers freely. The request shape is identical across them, so your response parser never needs to change.
Yes. SSE streaming passes through Router. If the primary fails mid-stream, Router swaps to the next model and resumes streaming without your client seeing a disconnect. Tool-use and structured-output streams handle the same way.
Yes. Router is OpenAI SDK compatible, change the base URL, keep your code. The `models` field is a native extension on the Chat Completions request; no custom client or gateway library.
Only the successful model charges tokens. A 429 or 503 before any tokens stream returns zero cost. A partial stream that fails over charges the primary for the partial response and the fallback for the completion. All logged per-request.
Every failover event is logged: the trigger (429, 503, timeout), retry count, backoff duration, primary and fallback model, final status. Query via the logs API or watch live in the portal.
Router tracks provider health continuously. A provider with rolling failures gets deprioritized automatically, even inside the same fallback chain, so a persistent outage doesn't add latency to every subsequent request.
Router is free during Research Preview. Zero markup on underlying model costs, zero fee per failover event.

Four nines. One JSON field.

Cross-provider fallbacks with streaming continuity and transparent retries. No gateway library.
Copyright © 2021-2026 Inworld AI
LLM Fallbacks: Cross-Provider Failover in One JSON Field | Inworld AI