How Vomyra Works
Every Myra call runs the same four-step pipeline: transcribe the caller's speech, decide what to say, optionally call a tool, then speak the reply. The end-to-end target is under 700ms � fast enough to feel like a real conversation.
End-to-end latency target
< 700ms response latency
From end of caller speech to first audio byte
This is the time from when the caller stops speaking to when Myra begins replying. Human conversations feel natural at under 700ms. Vomyra targets this through streaming at every stage � transcription, LLM output, and TTS are all streamed in parallel rather than sequentially, so the pipeline starts the next stage as soon as the first tokens arrive.
STT
80�150ms
LLM
300�500ms
Tool (if used)
Variable
TTS
100�200ms
The pipeline, step by step
Typical latency: ~80�150ms
Typical latency: ~300�500ms
Typical latency: Variable � depends on your endpoint
Typical latency: ~100�200ms
What you configure vs. what Vomyra manages
- Language and voice
- First message (opening line)
- System prompt (Myra's job)
- Tools and integrations
- Response delay and denoising
- Model and transcriber provider
- Real-time audio streaming
- Transcription pipeline routing
- LLM inference and streaming
- Turn detection and interruption
- TTS streaming to caller
- Infrastructure, scaling, uptime
Dive deeper
Speech-to-Text
Transcriber selection for Indian languages and noisy call environments.
Language Models
Compare providers and understand how the system prompt drives behaviour.
Text-to-Speech
Voice selection, quality, and latency for spoken responses.
Tools
How Myra acts on your systems mid-call without blocking the conversation.