Chat Agents

Build chat agents that can use any LLM

Hundreds of models through one OpenAI-compatible endpoint. Switch providers with a single field, route by user tier, and upgrade the same agent to voice when you're ready.

Get an API key

See the Router

Chat session

User

tier premium

Help me refactor this function

Agent

session.model anthropic/claude-sonnet-4-6tokens 4,220

Two options: pull the side effect out, or inline the retry loop.

Router Realtime API

The reasoning layer chat agents actually ship on.

Stop picking a vendor, start picking a model. Route by user tier, swap LLMs mid-flight, upgrade to voice on the same code.

Any LLM, one endpoint

Stop picking a vendor. Start picking a model.

GPT, Claude, Gemini, Llama, Mistral, Grok, and hundreds more behind one OpenAI-compatible endpoint. Switch providers with one field.

session.model = ?

p50 latency

openai/gpt-5.4

820ms

anthropic/claude-sonnet-4-6

1.2s

google/gemini-3.1-pro

950ms

meta/llama-4-maverick

420ms

mistral/medium-2508

680ms

groq/gpt-oss-120b

180ms

One field swaps the brain. Same OpenAI-compatible endpoint, same streaming.

Any LLM, one endpoint

Stop picking a vendor. Start picking a model.

GPT, Claude, Gemini, Llama, Mistral, Grok, and hundreds more behind one OpenAI-compatible endpoint. Switch providers with one field.

session.model = ?

p50 latency

openai/gpt-5.4

820ms

anthropic/claude-sonnet-4-6

1.2s

google/gemini-3.1-pro

950ms

meta/llama-4-maverick

420ms

mistral/medium-2508

680ms

groq/gpt-oss-120b

180ms

One field swaps the brain. Same OpenAI-compatible endpoint, same streaming.

Chat at production scale

Built for the traffic real chat agents actually produce.

Router is the reasoning layer behind top chat-agent platforms, coding agents, and premium companion apps. Same infrastructure that serves production voice.

Concurrent sessions, same Router

240k

Companions

180k

Coding

140k

Wellness

200k

Enterprise

Chat-agent load sits well inside the envelope Router already serves for voice.

Concurrent sessions, same Router

240k

Companions

180k

Coding

140k

Wellness

200k

Enterprise

Chat-agent load sits well inside the envelope Router already serves for voice.

Chat at production scale

Built for the traffic real chat agents actually produce.

Router is the reasoning layer behind top chat-agent platforms, coding agents, and premium companion apps. Same infrastructure that serves production voice.

One reasoning layer, every use case

Coding, wellness, support, companion, all on Router.

Whatever your agent is doing, you want the freedom to pick the best model for that job today and a different one tomorrow. Router gives you both.

Chat agents shipping on Router

Prompt + character chat platforms

Top-tier PLG Router customers

Roleplay + interactive chat

Router traffic at billions of tokens/day

Coding + developer-tool chat

Router for reasoning across models

Wellness + mental-health chat

Router with compliance-ready routing

One reasoning layer, every use case

Coding, wellness, support, companion, all on Router.

Whatever your agent is doing, you want the freedom to pick the best model for that job today and a different one tomorrow. Router gives you both.

Chat agents shipping on Router

Prompt + character chat platforms

Top-tier PLG Router customers

Roleplay + interactive chat

Router traffic at billions of tokens/day

Coding + developer-tool chat

Router for reasoning across models

Wellness + mental-health chat

Router with compliance-ready routing

Text today, voice tomorrow

Works with

Realtime API

The chat agent you ship today becomes the voice agent you ship next quarter.

Your Router-backed chat agent is already half a voice agent. The same session opens to full-duplex audio with a session.update change.

See the Realtime API

Text today, voice tomorrow

Stage 1

Text chat via Router

Stage 2

Same session, plus voice via Realtime API

Text today, voice tomorrow

Stage 1

Text chat via Router

Stage 2

Same session, plus voice via Realtime API

Text today, voice tomorrow

Works with

Realtime API

The chat agent you ship today becomes the voice agent you ship next quarter.

Your Router-backed chat agent is already half a voice agent. The same session opens to full-duplex audio with a session.update change.

See the Realtime API

Your UI doesn't notice

Stream into the same chat bubble you already ship.

Tokens, tool calls, and structured output come back in the shape your front-end already parses. No new rendering path, no new edge cases.

SSE streaming, tool use, structured output

for await (const chunk of stream) {  if (chunk.choices[0].delta.tool_calls) { ... }  else { ui.append(chunk.choices[0].delta.content); }}// OpenAI SSE format, every provider, no translation

Your UI doesn't notice

Stream into the same chat bubble you already ship.

Tokens, tool calls, and structured output come back in the shape your front-end already parses. No new rendering path, no new edge cases.

SSE streaming, tool use, structured output

for await (const chunk of stream) {  if (chunk.choices[0].delta.tool_calls) { ... }  else { ui.append(chunk.choices[0].delta.content); }}// OpenAI SSE format, every provider, no translation

Your framework keeps its job

Vercel AI SDK, LangChain, LlamaIndex, nothing changes.

Router speaks the same chat-completions contract every agent framework already knows. Plug it in where you plug in OpenAI and keep shipping.

Chat agent in session

User

Explain async/await in Rust like I know JS.

Agent

The core idea is the same, but the runtime isn't baked in.

model: anthropic/claude-sonnet-4-6

Chat agent in session

User

Explain async/await in Rust like I know JS.

Agent

The core idea is the same, but the runtime isn't baked in.

model: anthropic/claude-sonnet-4-6

Your framework keeps its job

Vercel AI SDK, LangChain, LlamaIndex, nothing changes.

Router speaks the same chat-completions contract every agent framework already knows. Plug it in where you plug in OpenAI and keep shipping.

FAQ

Hundreds, OpenAI GPT, Anthropic Claude, Google Gemini, Meta Llama, Mistral, xAI Grok, Groq, DeepSeek, Qwen, and more. One endpoint, one API key, one bill. See pricing.

Yes. Router speaks the OpenAI Chat Completions protocol, same endpoint shape, same request body, same streaming, same tool use. Swap the base URL on your OpenAI client and every call works.

Router runs on the same infrastructure that handles production voice traffic, so chat-agent load is well within range. Contact sales for enterprise rate limits.

Yes. The same session endpoint supports text-only and full-duplex voice via Realtime API. Add STT input + Inworld TTS output to the session.update message and your chat becomes a voice agent. Wellness and companion customers have shipped this upgrade.

Yes. Route by user tier, geography, prompt category, or any metadata via CEL rules. Sticky per-user routing means the same user always hits the same variant for A/B math. See multi-model routing.

Yes. Implicit caching on exact matches is free. Semantic caching (match by meaning) is opt-in. Cache hits return in ms at zero token cost. See cost optimization.

Chatbots and chat agents are the same shape: a message loop backed by an LLM. The page says chat agent because every chatbot customer we have is really running an agent, with memory, tool use, and adaptive behavior. One page, both intents.

Router is free during Research Preview, zero markup on pass-through model costs. You pay the underlying provider rate through a single Inworld bill.