Question 1

Who uses Inworld for language learning?

Accepted Answer

Conversation-practice apps, pronunciation tutors, K-12 ed-tech, adult language-learning platforms, and corporate-training products. Talkpal (5M+ learners) is a published Inworld customer; additional language-learning platforms are in production.

Question 2

Which languages are supported?

Accepted Answer

TTS across over 100 languages with Realtime TTS-2 (including English, Spanish, French, German, Italian, Portuguese, Japanese, Korean, Mandarin, Hindi, Arabic, Polish, Dutch, Russian, Hebrew). STT across 99+ via Whisper for speaking input. AssemblyAI covers 6 extra-high-accuracy languages on top.

Question 3

Can it score pronunciation?

Accepted Answer

Yes. Realtime STT-1 returns a voice profile with every utterance, covering accent, pace, vocal style, pitch, emotion, each with confidence scores. Score pronunciation per dimension and give specific feedback. See the emotional understanding page.

Question 4

Can I build a live speaking tutor?

Accepted Answer

Yes. The Realtime API runs full-duplex audio conversation under a second end-to-end. Language-learning customers run live tutor sessions on it today. Semantic turn-taking waits for the learner to finish, barge-in lets them interrupt.

Question 5

Can I use the same tutor voice across languages?

Accepted Answer

Yes. Cross-lingual cloning preserves the tutor's voice identity across every supported language. One cloned tutor can teach Spanish, French, and Italian in the same character. Voice Design produces new tutor voices from a text description.

Question 6

How does the cost math work at consumer scale?

Accepted Answer

Voice sits around a fraction of a cent per learner minute, so a free tier at millions of users still pencils out. The Router is free during research preview, and conditional routing sends free learners to a fast cheap model while premium learners get the best reasoning, all on the same endpoint. See cost optimization.

Question 7

Does this work for K-12 education?

Accepted Answer

Yes. K-12 tutoring platforms run on Realtime TTS at production volume today. Child-appropriate voices, alignment timestamps, and on-prem deployment are available for school-district sensitivity.

Question 8

What does the integration look like?

Accepted Answer

REST for sync TTS calls, WebSocket for streaming TTS and STT, Realtime API for full-duplex live tutors. OpenAI-compatible LLM endpoint via Router for tutor reasoning. Every Inworld endpoint is usable from any language with HTTP or WebSocket support.

Build a language tutor learners speak to daily

The voice stack the category already picked.

Published case study: Talkpal chose Inworld over every alternative.

Published case study: Talkpal chose Inworld over every alternative.

The default voice layer for language-learning apps.

The default voice layer for language-learning apps.

Know how the learner actually sounds, not just what they said.

Know how the learner actually sounds, not just what they said.

Every target language learners study.

Every target language learners study.

Full-duplex conversation practice in under a second.

Full-duplex conversation practice in under a second.

Millions of free-tier learners without blowing the margin.

Millions of free-tier learners without blowing the margin.

FAQ

The voice layer 5M+ language learners already use.