Speech Configuration
Speech configuration defines the audio foundation of Myra: how callers are transcribed, which languages are recognized, and how Myra's response is synthesized back into speech.
Input
STT provider
Language
Single or multilingual
Output
TTS voice
Purpose
Configure how Myra listens and speaks
Speech configuration controls the real-time audio layer of a Vomyra agent. It defines the speech-to-text provider that transcribes the caller, the language settings used during recognition, and the text-to-speech provider that voices Myra's response.
Speech-to-text
Turns caller audio into text for the language model.
Recognition mode
Controls single-language or multilingual transcription behavior.
Text-to-speech
Streams Myra's response back to the caller as audio.
Pipeline
Speech settings affect the full voice loop
A call feels natural only when transcription, model response, and voice output are tuned together. Improving one stage while ignoring the others can create slow, inaccurate, or unnatural calls.
Caller audio
Speech input
Model response
Voice output
Caller hears
Recommendation