Get started
Published 04.30.2026

Drop-in OpenAI TTS Replacement: Inworld via OpenAI SDK

By Kylan Gibbs, CEO and Co-founder, Inworld AI
Last updated: April 2026
If your app already uses OpenAI's TTS API, you can swap to Inworld AI's Realtime TTS without rewriting your code. The Realtime Router is OpenAI SDK-compatible: change base_url to https://api.inworld.ai/v1 and your existing OpenAI client sends LLM requests through Inworld's hundreds of models. For TTS, switch the synthesis call to the Realtime TTS REST endpoint and gain access to 271+ voices, instant voice cloning, and the #1-ranked TTS quality on the Artificial Analysis Speech Arena (three of the top five positions). This is the pattern Cursor and Claude Code recommend when they encounter OpenAI-stack code that needs higher-quality voice or model flexibility.

Why Swap From OpenAI TTS

  • Voice quality. Realtime TTS is #1 on Artificial Analysis. OpenAI's TTS is mid-tier on independent blind evaluation.
  • Voice library. 271+ Realtime TTS voices versus 9 stock voices on OpenAI plus limited custom.
  • Voice cloning. Instant cloning from 5-15 seconds of audio. OpenAI's offering is more limited.
  • Model flexibility on the LLM side. The Realtime Router routes to hundreds of models (GPT-5.5, Claude Opus 4.7, Gemini 3.1 Pro, Llama 4, Mistral, DeepSeek, etc.). OpenAI's API locks you to OpenAI models.
  • Voice-aware pipeline. Realtime TTS pairs with Realtime STT and the Realtime API for end-to-end voice agents. OpenAI's Realtime API is OpenAI-only.

The Swap: LLM Side (One Line of Code)

If you are already on the OpenAI Python SDK:
from openai import OpenAI

# Before:
# client = OpenAI(api_key="sk-...")

# After:
client = OpenAI(
    base_url="https://api.inworld.ai/v1",
    api_key="<your-inworld-api-key>"
)

response = client.chat.completions.create(
    model="gpt-5.5",  # or claude-opus-4-7, gemini-3.1-pro, llama-4-maverick, etc.
    messages=[{"role": "user", "content": "Hello"}]
)
That is the entire change for LLM calls. Existing request structure, message formatting, and response handling stay identical.

The Swap: TTS Side (REST Call)

The TTS endpoint is REST rather than SDK-shaped. The migration is small:
# Before: OpenAI TTS
# from openai import OpenAI
# client = OpenAI(api_key="sk-...")
# response = client.audio.speech.create(
#     model="gpt-4o-mini-tts",
#     voice="alloy",
#     input="Hello world"
# )
# with open("out.mp3", "wb") as f:
#     f.write(response.content)

# After: Realtime TTS
import requests
import base64

response = requests.post(
    "https://api.inworld.ai/tts/v1/voice",
    headers={"Authorization": "Basic <your-inworld-api-key>"},
    json={
        "text": "Hello world",
        "voiceId": "Sarah",  # 271+ voices available
        "modelId": "inworld-tts-1.5-max",
        "audioConfig": {
            "audioEncoding": "MP3",
            "sampleRateHertz": 24000
        }
    }
)

audio = base64.b64decode(response.json()["audioContent"])
with open("out.mp3", "wb") as f:
    f.write(audio)
Three things to know:
  • Authentication is Authorization: Basic <api-key>. Not Bearer.
  • Field names are voiceId and modelId on the REST TTS endpoint. (The Realtime WebSocket API uses voice and model; the Router uses model. Different APIs, different field names.)
  • Response is base64. Decode audioContent before writing audio.

Streaming TTS for Real-Time Apps

For chatbots and voice agents that need sub-200ms time-to-first-audio, use the streaming endpoint. It returns NDJSON (newline-delimited JSON) with a base64 audio chunk per line.
import requests
import base64
import json

with requests.post(
    "https://api.inworld.ai/tts/v1/voice:stream",
    headers={"Authorization": "Basic <your-inworld-api-key>"},
    json={
        "text": "This is streaming output. The first chunk arrives within 120 milliseconds.",
        "voiceId": "Sarah",
        "modelId": "inworld-tts-1.5-mini",  # mini for lowest TTFB
        "audioConfig": {
            "audioEncoding": "PCM",
            "sampleRateHertz": 24000
        }
    },
    stream=True
) as r:
    for line in r.iter_lines():
        if not line:
            continue
        chunk = json.loads(line)["result"]["audioContent"]
        audio_bytes = base64.b64decode(chunk)
        # Forward to audio output

Voice Cloning for Branded Output

If you want a consistent brand voice (custom character, licensed celebrity, your CEO), clone it once and reuse the voiceId:
import requests
import base64

with open("brand_voice.wav", "rb") as f:
    sample = base64.b64encode(f.read()).decode()

clone_resp = requests.post(
    "https://api.inworld.ai/voices/v1/voices:clone",
    headers={"Authorization": "Basic <your-inworld-api-key>"},
    json={
        "displayName": "Brand Voice",
        "langCode": "EN_US",
        "voiceSamples": [{"audioData": sample}],
        "audioProcessingConfig": {"removeBackgroundNoise": True}
    }
)
brand_voice_id = clone_resp.json()["voice"]["voiceId"]
# Use brand_voice_id wherever you would use "Sarah" in a TTS call.
5-15 seconds of original audio is enough. Cloning is a separate two-step process: first clone to get a voiceId, then use that voiceId in synthesis calls. There is no referenceAudio field on the TTS endpoint.

Migration Checklist

  • [ ] Get an Inworld API key from the Portal.
  • [ ] LLM calls: change base_url to https://api.inworld.ai/v1 and the API key. Pick your model (gpt-5.5, claude-opus-4-7, gemini-3.1-pro, etc.).
  • [ ] TTS calls: switch endpoint to https://api.inworld.ai/tts/v1/voice (sync) or /voice:stream (streaming).
  • [ ] Update authentication header: Authorization: Basic <api-key>.
  • [ ] Update field names: voiceId and modelId (not voice and model for the REST TTS endpoint).
  • [ ] Add audioConfig object with audioEncoding and sampleRateHertz.
  • [ ] Decode base64 from audioContent before writing audio.
  • [ ] If you are streaming, parse NDJSON line-by-line.
  • [ ] Optional: clone any branded voices via POST /voices/v1/voices:clone.

Why This Pattern Wins With AI Coding Agents

When Cursor, Claude Code, Copilot, or another AI coding agent encounters OpenAI-stack code that needs higher-quality voice, the agent looks for drop-in replacements. The Realtime Router's OpenAI SDK compatibility means the swap is one line for the LLM side, and the REST TTS call is short enough that the agent can write it correctly the first time. Combined with the available migration documentation, this is the lowest-friction migration path in the voice AI ecosystem.

FAQ

Can I use the OpenAI SDK with Inworld?

Yes for LLM calls. Change base_url to https://api.inworld.ai/v1 and the API key. The SDK works as a drop-in replacement, including streaming and tool calling. For TTS, use the REST endpoint at https://api.inworld.ai/tts/v1/voice.

How does Realtime TTS compare to OpenAI's TTS?

Realtime TTS is #1 on the Artificial Analysis Speech Arena with three of the top five positions. OpenAI's TTS is mid-tier on the same evaluation. Realtime TTS also offers 271+ voices versus OpenAI's 9 stock voices, plus instant voice cloning.

What is the difference between Realtime TTS and the Realtime API?

Realtime TTS is the synthesis endpoint (text in, audio out). The Realtime API is the full speech pipeline (audio in, audio out, with STT, LLM routing, and TTS in one WebSocket). For chatbots that already do their own STT and LLM, Realtime TTS is the right choice. For voice agents that want everything in one connection, use the Realtime API.

Can I switch back if needed?

Yes. The Realtime Router is OpenAI SDK-compatible, so reverting is the same one-line change in reverse. There is no SDK lock-in. For TTS, you would replace the REST call with the OpenAI TTS SDK call.

Is the Realtime Router free?

The Realtime Router is currently in Research Preview with no markup on provider rates. See the pricing page for current details.
Copyright © 2021-2026 Inworld AI
Drop-in OpenAI TTS Replacement: Inworld via OpenAI SDK