Inworld Agent Runtime

Build realtime voice and chat agents for demanding applications. Integrated metrics and experiments to optimize for user outcomes. Deploy to hosted API endpoints or integrate via SDKs.

Get started Read the docs

Realtime agents built for scale

Build with production-grade orchestration and rapid inference.

Try demo Clone voice agent

Model-Agnostic Orchestration

Lightning-fast C++-based orchestration that provides unified access to all the best models. LLM, TTS, STT, tools, and more.

Learn more

A technical flowchart of a multi-step data processing system with nodes for input, processing, and output.

Integrated Observability for Measurable Gains

Easily monitor performance, costs, and user patterns on every interaction.

Learn more

A bar graph charting the average session length over the last 30 seconds.

Improve User Engagement with Experiments

Instantly deploy new models and prompts and measure impact on user metrics.

Learn more

A UI element with a toggle to enable an experiment comparing the Anthropic and OpenAI AI models.

Why Inworld Agent Runtime

Your users deserve the best quality, availability and speed

Exceptional Quality

Serve personalized models and prompts to delight every user.

A decision tree diagram mapping user interests to different personality traits for personalization.

High Availability

Automatic failovers prevent downtime from outages and rate limits.

A conceptual diagram showing the 'anthropic/claude-sonnet-4' AI model as a central component that connects to or is comparable with models from Google Gemini, Anthropic, and OpenAI.

Ultra Low Latency

Lightning-fast execution that scales seamlessly from 10 to 10M users with minimum code changes.

A green line graph showing past fluctuations and a projected upward trend.

Proven Results

Built for every consumer AI application - from social apps and games to learning and wellness.

Wishroll Status

Went from prototype to 1M users in 19 days with 20x cost reduction

Learn more

Little Umbrella

From a 1.2 Billion token bill to profitability with 20 million players

Learn more

Streamlabs

Built a realtime multimodal streaming assistant with sub 500ms latency

Learn more

Bible Chat

Increased voice AI feature engagement and reached millions

Built for every consumer AI application

Agent Runtime scales consumer AI, driving experiences from social apps and games to learning and wellness.

Social & Community Apps

AI-powered social discovery, content moderation, and personalized feeds that understand context and scale to millions of users

Learn more

Education & Learning

Voice-enabled language tutors, adaptive study companions, and intelligent content generation that meets each learner where they are

Learn more

Health & Wellness

24/7 mental health companions, personalized fitness coaching, and health assistants that understand individual needs and privacy

Learn more

Gaming & Interactive Media

Dynamic NPCs, interactive storytelling, and immersive experiences powered by AI that scales from indie games to AAA studios

Learn more

Get started

Inworld easily integrates with any existing stack or provider (Anthropic, Google, Mistral, OpenAI, etc.) via one API key.

Available to everyone now.

Start building Explore templates

FAQ

The Inworld Agent Runtime uniquely combines

lightning-fast C++ core for realtime multimodal conversational interactions.

built-in telemetry for deep user insights (traces & logs).

live A/B testing to accelerate improvements to the end user experience.

Yes, Agent Runtime is free. Consumption of models is the only thing you pay for. Agent Runtime itself incurs no cost or license fee. Learn more about model pricing here.

Follow our quick start guide to deploy a realtime conversational AI endpoint in 3 minutes - Then integrate into your app.

Inworld Agent Runtime is specifically designed for developers building realtime conversational AI and voice agents that scale to millions of concurrent users.

Use cases include language tutors, social media, AI companions, game characters, fitness coaches, social media, shopping agents, and more.

Yes, you can use the Inworld CLI to deploy a hosted endpoint that can be easily called by any part of your existing stack.

Developers get a full suite of pre-optimized nodes to construct any real-time AI pipeline that can scale to millions of users, including nodes for

model I/O (STT, LLM, TTS)
data engineering (prompt building, chunking)
flow logic (keyword matching, safety)
external tool calls (MCP integrations)
and more

Use Inworld’s state-of-the-art TTS models alongside your preferred LLM.

We support all major providers—OpenAI, Anthropic, Google, Mistral—and low-latency platforms like Fireworks, Groq, and Tenstorrent. Read supported models here.