Get started
Chat Agents

Build chat agents that can use any LLM

Hundreds of models through one OpenAI-compatible endpoint. Switch providers with a single field, route by user tier, and upgrade the same agent to voice when you're ready.
Chat session
User
tier premium

Help me refactor this function

Agent
session.model anthropic/claude-sonnet-4-6tokens 4,220

Two options: pull the side effect out, or inline the retry loop.

The reasoning layer chat agents actually ship on.

Stop picking a vendor, start picking a model. Route by user tier, swap LLMs mid-flight, upgrade to voice on the same code.
Any LLM, one endpoint

Stop picking a vendor. Start picking a model.

GPT, Claude, Gemini, Llama, Mistral, Grok, and hundreds more behind one OpenAI-compatible endpoint. Switch providers with one field.
session.model = ?
p50 latency
openai/gpt-5.4
820ms
anthropic/claude-sonnet-4-6
1.2s
google/gemini-3.1-pro
950ms
meta/llama-4-maverick
420ms
mistral/medium-2508
680ms
groq/gpt-oss-120b
180ms
One field swaps the brain. Same OpenAI-compatible endpoint, same streaming.
Chat at production scale

Built for the traffic real chat agents actually produce.

Router is the reasoning layer behind top chat-agent platforms, coding agents, and premium companion apps. Same infrastructure that serves production voice.
Concurrent sessions, same Router
240k
Companions
180k
Coding
140k
Wellness
200k
Enterprise
Chat-agent load sits well inside the envelope Router already serves for voice.
One reasoning layer, every use case

Coding, wellness, support, companion, all on Router.

Whatever your agent is doing, you want the freedom to pick the best model for that job today and a different one tomorrow. Router gives you both.
Chat agents shipping on Router
Prompt + character chat platforms
Top-tier PLG Router customers
Roleplay + interactive chat
Router traffic at billions of tokens/day
Coding + developer-tool chat
Router for reasoning across models
Wellness + mental-health chat
Router with compliance-ready routing
Text today, voice tomorrow
Works with
Realtime API

The chat agent you ship today becomes the voice agent you ship next quarter.

Your Router-backed chat agent is already half a voice agent. The same session opens to full-duplex audio with a session.update change.
See the Realtime API
Text today, voice tomorrow
Stage 1
Text chat via Router
Stage 2
Same session, plus voice via Realtime API
Your UI doesn't notice

Stream into the same chat bubble you already ship.

Tokens, tool calls, and structured output come back in the shape your front-end already parses. No new rendering path, no new edge cases.
SSE streaming, tool use, structured output
for await (const chunk of stream) {  if (chunk.choices[0].delta.tool_calls) { ... }  else { ui.append(chunk.choices[0].delta.content); }}// OpenAI SSE format, every provider, no translation
Your framework keeps its job

Vercel AI SDK, LangChain, LlamaIndex, nothing changes.

Router speaks the same chat-completions contract every agent framework already knows. Plug it in where you plug in OpenAI and keep shipping.
Chat agent in session
User
Explain async/await in Rust like I know JS.
Agent
The core idea is the same, but the runtime isn't baked in.
model: anthropic/claude-sonnet-4-6

FAQ

Hundreds, OpenAI GPT, Anthropic Claude, Google Gemini, Meta Llama, Mistral, xAI Grok, Groq, DeepSeek, Qwen, and more. One endpoint, one API key, one bill. See pricing.
Yes. Router speaks the OpenAI Chat Completions protocol, same endpoint shape, same request body, same streaming, same tool use. Swap the base URL on your OpenAI client and every call works.
Router runs on the same infrastructure that handles production voice traffic, so chat-agent load is well within range. Contact sales for enterprise rate limits.
Yes. The same session endpoint supports text-only and full-duplex voice via Realtime API. Add STT input + Inworld TTS output to the session.update message and your chat becomes a voice agent. Wellness and companion customers have shipped this upgrade.
Yes. Route by user tier, geography, prompt category, or any metadata via CEL rules. Sticky per-user routing means the same user always hits the same variant for A/B math. See multi-model routing.
Yes. Implicit caching on exact matches is free. Semantic caching (match by meaning) is opt-in. Cache hits return in ms at zero token cost. See cost optimization.
Chatbots and chat agents are the same shape: a message loop backed by an LLM. The page says chat agent because every chatbot customer we have is really running an agent, with memory, tool use, and adaptive behavior. One page, both intents.
Router is free during Research Preview, zero markup on pass-through model costs. You pay the underlying provider rate through a single Inworld bill.

Any LLM today. Full voice tomorrow. One endpoint.

Router-backed chat agents that ship at production scale and upgrade to voice without a rewrite.
Copyright © 2021-2026 Inworld AI
Chat Agents API: Any LLM, One Endpoint, Voice-Ready | Inworld AI