Native-quality voices in 15 languages, pronunciation scoring via voice profiling, and cross-lingual cloning so one tutor voice speaks every language your app teaches.
The default voice layer for language-learning apps.
Talkpal, ThetaWise, Promova, and LingQ ship on Inworld. Some use TTS only, others run the full pipeline with Router reasoning.
Pronunciation scoring built in
Know how the learner actually sounds, not just what they said.
STT-1 returns a voice profile on every utterance: accent, pace, pitch, style, emotion. Score pronunciation per dimension and track progress across sessions.
Know how the learner actually sounds, not just what they said.
STT-1 returns a voice profile on every utterance: accent, pace, pitch, style, emotion. Score pronunciation per dimension and track progress across sessions.
15 TTS languages for native tutor output. 99+ languages for speaking input via Whisper. Cross-lingual cloning keeps one tutor voice consistent across every language you teach.
15 TTS languages · 99+ for STT
English
Spanish
French
German
Italian
Portuguese
Japanese
Korean
Mandarin
Hindi
Arabic
Polish
Dutch
Russian
Hebrew
One designed tutor voice, every target language your learners study.
15 TTS languages · 99+ for STT
English
Spanish
French
German
Italian
Portuguese
Japanese
Korean
Mandarin
Hindi
Arabic
Polish
Dutch
Russian
Hebrew
One designed tutor voice, every target language your learners study.
15 TTS languages for native tutor output. 99+ languages for speaking input via Whisper. Cross-lingual cloning keeps one tutor voice consistent across every language you teach.
Millions of free-tier learners without blowing the margin.
Voice sits around a few cents per active user per month. Free learners hit cheap models through conditional routing. Premium learners get the best reasoning.
Millions of free-tier learners without blowing the margin.
Voice sits around a few cents per active user per month. Free learners hit cheap models through conditional routing. Premium learners get the best reasoning.
FAQ
Conversation-practice apps, pronunciation tutors, K-12 ed-tech, adult language-learning platforms, and corporate-training products. Talkpal (5M+ learners) is a published Inworld customer; additional language-learning platforms are in production.
TTS across 15 languages (English, Spanish, French, German, Italian, Portuguese, Japanese, Korean, Mandarin, Hindi, Arabic, Polish, Dutch, Russian, Hebrew). STT across 99+ via Whisper for speaking input. AssemblyAI covers 6 extra-high-accuracy languages on top.
Yes. Inworld STT-1 returns a voice profile with every utterance, covering accent, pace, vocal style, pitch, emotion, each with confidence scores. Score pronunciation per dimension and give specific feedback. See the emotional understanding page.
Yes. The Realtime API runs full-duplex audio conversation under a second end-to-end. Language-learning customers run live tutor sessions on it today. Semantic turn-taking waits for the learner to finish, barge-in lets them interrupt.
Yes. Cross-lingual cloning preserves the tutor's voice identity across every supported language. One cloned tutor can teach Spanish, French, and Italian in the same character. Voice Design produces new tutor voices from a text description.
Voice sits around a fraction of a cent per learner minute, so a free tier at millions of users still pencils out. The Router is free during research preview, and conditional routing sends free learners to a fast cheap model while premium learners get the best reasoning, all on the same endpoint. See cost optimization.
Yes. K-12 tutoring platforms run on Inworld TTS at production volume today. Child-appropriate voices, alignment timestamps, and on-prem deployment are available for school-district sensitivity.
REST for sync TTS calls, WebSocket for streaming TTS and STT, Realtime API for full-duplex live tutors. OpenAI-compatible LLM endpoint via Router for tutor reasoning. Every Inworld endpoint is usable from any language with HTTP or WebSocket support.
The voice layer 5M+ language learners already use.