Why Inworld + Vapi for voice AI
- Expressive voices: Choose from dozens of pre-built Inworld voices that have been trained on diverse datasets to capture subtle nuances in tone and prosody. Inworld voices make AI interactions feel more natural, which was previously only achievable via high-end custom pipelines.
- Multilingual: Build agents in 11 of the most common languages for consumer applications, including English (with its various accents), Chinese, Korean, Dutch, French, Spanish, and more.
- Accessible pricing: Studio-quality voices for just $15/1M characters, 75% cheaper than other providers, so you can build engaging experiences that scale with your users.
- Streaming-ready: Inworld voices support ~200ms latency to the first audio chunk to meet a range of voice agent use cases.
- API-native: Everything is exposed on Vapi as an API, with 1000s of configurations and integrations. Plug in your APIs as tools to intelligently fetch data and perform actions on your server.
By developers, for developers

Inworld x Vapi collaboration
“We're thrilled to partner with Vapi to bring high-quality, real-time latency voices at a radically more accessible price point to their developer community. By democratizing access to state-of-the-art TTS technology, we're excited to empower the next wave of innovation in voice-first experiences.”
Jean Wang, Inworld Head of Product.
“Working with Inworld helps us open up new possibilities for developers building expressive, real-time voice agents. The focus is always on giving builders access to great tools, and this integration fits perfectly with that mission. We're excited to see what our developer community creates next”
Jordan Dearsley, Vapi Co-Founder













