Inworld meets Pipecat: Raising the bar for realtime voice AI

Listen to this article

0:00

The bar for interactive voice AI continues to rise. Today, we’re excited to announce that Realtime TTS is now fully integrated with Pipecat: the open-source, vendor-neutral framework, purpose-built for architecting realtime voice agents and multimodal AI applications. Pipecat helps developers manage the complex orchestration of AI services, "conversational" mode like interruptions and phrase endpointing, telephony and network transport, cross-platform libraries, audio processing, and multimodal interactions, all at ultra-low latencies. This integration makes it easier than ever to deploy fast agents with emotionally expressive speech, using Inworld’s TTS models in your own multimodal AI pipelines.

No matter what you're building - from voice assistants to AI companions, customer support agents, or immersive consumer experiences - Pipecat orchestrates the entire conversational flow while Inworld brings truly natural voice to life at 75% cheaper than other providers.

What is Pipecat?

Pipecat is a fully open-source, vendor neutral framework that enables developers to connect STT, LLMs, and TTS into realtime pipelines. It’s designed to power voice-first, multimodal agents with high responsiveness and flexibility.

Pipecat gives developers the most flexibility; it stands out for many reasons:

Avoid vendor lock-in: Pipecat is not tightly coupled to any vendor's infrastructure. Deploy Pipecat and use the infrastructure you prefer.
Streaming-first architecture: Pipecat's realtime "frames" model enables TTS to begin speaking before a sentence is even complete, while Realtime TTS delivers rich, emotionally nuanced speech that's virtually indistinguishable from human conversation.
Modular, pluggable pipelines: Connect Inworld's voices to any STT or LLM using Pipecat's modular pipeline. Run multiple models in parallel, test different configurations, and connect your agent to custom logic and databases with built-in tools and advanced function calling
Native telephony, transport and cross-library support: Pipecat supports realtime AI client SDKs for JavaScript, React, iOS, Android, C++, and Python, and deploys across telephony (including native Twilio), WebSockets, SIP, and WebRTC. You can easily build for web browsers, mobile apps, or traditional phone systems; Pipecat's transport layer adapts to your needs while Inworld's voices maintain consistent quality across all platforms.
Smart Turn v2 model: Create accurate turn detection with native audio. Smart Turn v2 is trained on audio data and uses the speaker's audio as input. This lets your agents make decisions using the intonation and pace of the user's speech, while they're using Inworld's rich, expressive voices. The model also is fully open source (weights, training script, data sets).

It’s a full orchestration layer for building rich, interactive, responsive AI experiences.

Realtime expression, delivered naturally

Realtime TTS was built for dynamic, emotionally intelligent speech, designed to handle the unpredictability and expressive needs of realtime interaction. Plus, developers get production-ready voice AI for just $15/1M characters - 75% cheaper than other providers.

With Realtime TTS in a Pipecat pipeline, you get:

Millisecond audio synthesis that starts streaming before complete sentences
Custom and pre-built voices across 11 languages, with more coming soon
Natural vocalizations and emotional intelligence that adapts to conversational context
Consistent low latency even with complex multi-step reasoning or function calls
Zero-shot voice cloning from just seconds of audio - available free to all users

Together, Pipecat and Inworld make it possible to carry on a fluid, engaging conversation - in your browser, on the phone, or anywhere voice is used.

Start building today

This integration reflects both companies' commitment to democratizing access to cutting-edge AI technology. Pipecat's open-source approach removes barriers to experimentation and deployment, while Inworld's accessible pricing ensures that high-quality voice AI isn't limited to well-funded enterprises.

The future of voice AI is expressive, accessible, and real-time. With Inworld + Pipecat, that future is available right now.

Explore Pipecat documentation

Learn about Realtime TTS capabilities

Try Realtime TTS in the playground

Join the Pipecat Discord community