Emotional Understanding

Voice AI that hears how people feel

Your agent hears the tone, not just the words. Read how someone sounds on every utterance, then let that signal flow through the LLM and into the reply, so every response feels like it was actually listening.

Try voice profiling Read the docs

Voice profile

Utterance

profile.emotion stressed (0.87)profile.style tentative (0.78)

I've been putting this off for months.

Agent

tts.tone calm, slowervoice Luna

Let's take a breath and pick one thing together.

Works with

STT Realtime API Router TTS

Voice AI that actually sounds like it's listening.

Tone, age, accent, and mood travel with every utterance, and the reply adapts to match.

Every utterance, understood

Your agent reads who's speaking and how they feel, automatically.

Every transcript from STT-1 ships with a voice profile plus confidence scores. Threshold it, decay it, reason with it. Not a label. A signal.

Every utterance, understood

with confidence

Emotion

stressed

Age

30s

Accent

British (RP)

Vocal style

measured

Pitch

medium-low

Every utterance, understood

Your agent reads who's speaking and how they feel, automatically.

Every transcript from STT-1 ships with a voice profile plus confidence scores. Threshold it, decay it, reason with it. Not a label. A signal.

Every utterance, understood

with confidence

Emotion

stressed

Age

30s

Accent

British (RP)

Vocal style

measured

Pitch

medium-low

Flows through the pipeline

Works with

Router

TTS

Realtime API

Tone travels with every turn, from the mic to the answer.

The profile injects into Router context so the LLM sees stress. Router steers TTS to reply softer and slower. Standalone emotion APIs stop at a score.

Emotion flows through the pipeline

STT

Detects emotion

stressed · 0.87

Router

Reasons with it

"match user tone"

TTS

Responds in kind

softer · [sigh]

Standalone emotion APIs return a score. Ours becomes part of the conversation.

Emotion flows through the pipeline

STT

Detects emotion

stressed · 0.87

Router

Reasons with it

"match user tone"

TTS

Responds in kind

softer · [sigh]

Standalone emotion APIs return a score. Ours becomes part of the conversation.

Flows through the pipeline

Works with

Router

TTS

Realtime API

Tone travels with every turn, from the mic to the answer.

The profile injects into Router context so the LLM sees stress. Router steers TTS to reply softer and slower. Standalone emotion APIs stop at a score.

Respond in kind

Works with

Realtime API

TTS

Your agent hears the tone, not just the words.

A stressed user gets a calm reply, not a checklist. The profile feeds the prompt directly. No separate emotion model. No glue code between layers.

User utterance

"I've been putting this off for months. I don't even know where to start."

emotion: stressed

pace: slow

agent responds: calm, short

Respond in kind

Works with

Realtime API

TTS

Your agent hears the tone, not just the words.

A stressed user gets a calm reply, not a checklist. The profile feeds the prompt directly. No separate emotion model. No glue code between layers.

User utterance

"I've been putting this off for months. I don't even know where to start."

emotion: stressed

pace: slow

agent responds: calm, short

Scored, not labeled

Signals you can act on, not just observe.

Every signal ships with a confidence score between 0 and 1. Threshold, decay over time, or require agreement across utterances. Never stuck with a binary label.

Confidence, not a label

Dimensions, scored per utterance

Threshold or decay on confidence. Your app, your rules.

Confidence, not a label

Dimensions, scored per utterance

Threshold or decay on confidence. Your app, your rules.

Scored, not labeled

Signals you can act on, not just observe.

Every signal ships with a confidence score between 0 and 1. Threshold, decay over time, or require agreement across utterances. Never stuck with a binary label.

Tracks shifts in real time

Works with

Realtime API

A call isn't a single emotion. It's a curve.

The profile updates per utterance. Watch a call move from anxious to relieved and have your agent behave differently at each beat. De-escalation falls out naturally.

Tracks shifts in real time

call timeline