Voice Pipeline
The voice pipeline connects audio cleanup, speech recognition, model generation, and voice synthesis into one real-time loop. Use this page to understand the full path from caller speech to Myra's audio response.
Voice pipeline
How speech moves through Vomyra in real time
The voice pipeline is the sequence of audio and AI stages that turns caller speech into Myra's response. Each stage affects latency, accuracy, interruption behavior, and voice quality.
STT
80-250 msSpeech-to-text turns caller audio into a transcript.
Core idea
A natural call depends on the full pipeline
If Myra feels slow or interrupts callers, the issue may be denoising, VAD, STT, model latency, TTS, or network transport. Tune the pipeline as a system rather than changing one setting blindly.
Accuracy
Clean audio and strong transcription produce better model responses.
Timing
VAD, response delay, model speed, and TTS determine how quickly the caller hears audio.
Reliability
Fallbacks and provider choices keep calls stable when a provider is slow or unavailable.
Recommendation