Model Configurations

Voice Pipeline

The voice pipeline connects audio cleanup, speech recognition, model generation, and voice synthesis into one real-time loop. Use this page to understand the full path from caller speech to Myra's audio response.

Voice pipeline

How speech moves through Vomyra in real time

The voice pipeline is the sequence of audio and AI stages that turns caller speech into Myra's response. Each stage affects latency, accuracy, interruption behavior, and voice quality.

STT

80-250 ms

Speech-to-text turns caller audio into a transcript.

Core idea

A natural call depends on the full pipeline

If Myra feels slow or interrupts callers, the issue may be denoising, VAD, STT, model latency, TTS, or network transport. Tune the pipeline as a system rather than changing one setting blindly.

Accuracy

Clean audio and strong transcription produce better model responses.

Timing

VAD, response delay, model speed, and TTS determine how quickly the caller hears audio.

Reliability

Fallbacks and provider choices keep calls stable when a provider is slow or unavailable.

Recommendation

Start with a stable baseline: Standard denoising, single-language STT, moderate endpointing, a fast model, and streaming TTS. Then tune one stage at a time from real call recordings.

Speech Configuration

OpenAI Realtime