Advanced Settings

Response Delay

Control how quickly Myra responds after a caller finishes speaking. Tune endpointing and linear delay to prevent interruptions without making the conversation feel slow.

Default endpointing

250 ms

Default linear delay

400 ms

Recommended total

400-800 ms

What is Response Delay?

Response Delay controls how quickly Myra responds after a caller finishes speaking. It determines the amount of time Vomyra waits before generating a response after detecting that the caller has stopped speaking.

Choosing the correct delay is critical. Too low, and callers may be interrupted. Too high, and conversations feel slow and robotic. Most agents perform best between 400 and 800 ms of total response delay.

How voice response timing works

Voice Response PipelineTotal: 500–1250 ms

0 ms~800 ms (target)

Speech-to-Text

80–150 ms

Transcribes caller's spoken words into text

Endpointing

150–400 ms

Detects end-of-turn; waits configured silence window

LLM Inference

200–500 ms

Language model processes context and generates reply

Text-to-Speech

80–200 ms

Synthesises response audio and streams to caller

Response Delay controls the configurable waiting period after caller speech stops: endpointing plus any added linear delay. Reducing it lowers perceived latency; increasing it reduces false turn-ends.

Configurable settings

Endpointing

Voice Activity Detection (VAD)

The duration of silence the transcriber waits before declaring a caller's turn complete. Once this window expires without detected speech, the STT engine emits an end-of-utterance signal and the LLM begins generating.

Typical range

150 – 400 ms

Recommended default: 200–250 ms

Linear Delay

Incremental response buffer

An additional fixed delay injected after endpointing fires, before the LLM starts. This buffer accounts for callers who naturally pause mid-sentence before completing a thought — reducing premature responses.

Typical range

300 – 600 ms

Recommended default: 400–450 ms

Important

Lower latency is not always better. Sub-400 ms combined delay is appropriate for fast-paced outbound sales calls, but will produce interruptions on support lines, elderly callers, or any scenario involving longer natural pauses. Always validate configuration changes with real test calls.

Latency benchmarks

Threshold	Perception	Guidance
< 500 ms	Imperceptible	Target for real-time sales and high-engagement outbound calls.
500–800 ms	Natural	Acceptable for most inbound support and booking agents.
800–1200 ms	Noticeable	Acceptable for complex queries; caller may perceive a brief pause.
> 1200 ms	Disruptive	Exceeds human conversational tolerance; investigate pipeline bottlenecks.

First Message

Background Denoising

What is Response Delay?

How voice response timing works

Voice Response PipelineTotal: 500–1250 ms

0 ms~800 ms (target)

Speech-to-Text

80–150 ms

Transcribes caller's spoken words into text

Endpointing

150–400 ms

Detects end-of-turn; waits configured silence window

LLM Inference

200–500 ms

Language model processes context and generates reply

Text-to-Speech

80–200 ms

Synthesises response audio and streams to caller

Configurable settings

Endpointing

Voice Activity Detection (VAD)

Typical range

150 – 400 ms

Recommended default: 200–250 ms

Linear Delay

Incremental response buffer

Typical range

300 – 600 ms

Recommended default: 400–450 ms

Latency benchmarks

Threshold	Perception	Guidance
< 500 ms	Imperceptible	Target for real-time sales and high-engagement outbound calls.
500–800 ms	Natural	Acceptable for most inbound support and booking agents.
800–1200 ms	Noticeable	Acceptable for complex queries; caller may perceive a brief pause.
> 1200 ms	Disruptive	Exceeds human conversational tolerance; investigate pipeline bottlenecks.