Model Configurations

Speech Configuration

Speech configuration defines the audio foundation of Myra: how callers are transcribed, which languages are recognized, and how Myra's response is synthesized back into speech.

Input

STT provider

Language

Single or multilingual

Output

TTS voice

Purpose

Configure how Myra listens and speaks

Speech configuration controls the real-time audio layer of a Vomyra agent. It defines the speech-to-text provider that transcribes the caller, the language settings used during recognition, and the text-to-speech provider that voices Myra's response.

Input

Speech-to-text

Turns caller audio into text for the language model.

Language

Recognition mode

Controls single-language or multilingual transcription behavior.

Output

Text-to-speech

Streams Myra's response back to the caller as audio.

Pipeline

Speech settings affect the full voice loop

A call feels natural only when transcription, model response, and voice output are tuned together. Improving one stage while ignoring the others can create slow, inaccurate, or unnatural calls.

Caller audio

Speech input

Model response

Voice output

Caller hears

Recommendation

For most Indian phone agents, start with Azure Speech for STT, a single primary language, and Azure TTS for the first production version. Expand to multilingual or custom voice providers after the baseline call quality is stable.

Background Denoising

Voice Pipeline Configuration

Model Configurations

Speech Configuration

Speech configuration defines the audio foundation of Myra: how callers are transcribed, which languages are recognized, and how Myra's response is synthesized back into speech.

Input

STT provider

Language

Single or multilingual

Output

TTS voice

Purpose

Configure how Myra listens and speaks

Input

Speech-to-text

Turns caller audio into text for the language model.

Language

Recognition mode

Controls single-language or multilingual transcription behavior.

Output

Text-to-speech

Streams Myra's response back to the caller as audio.

Pipeline

Speech settings affect the full voice loop

A call feels natural only when transcription, model response, and voice output are tuned together. Improving one stage while ignoring the others can create slow, inaccurate, or unnatural calls.

Caller audio

Speech input

Model response

Voice output

Caller hears

Recommendation

Background Denoising

Voice Pipeline Configuration