Transcriber Settings

In transcriber settings, you define when the assistant decides that the user has finished speaking. This is one of the most effective areas to reduce perceived latency.

1. Endpoint type

With endpoint type, you choose how end-of-utterance is detected:

Speech detection (default): usually faster, strong for low latency.
AI detection: often better for longer user turns with thinking pauses.

For many assistants, start with speech detection and switch only if needed.

2. Endpoint sensitivity

Endpoint sensitivity controls how long the assistant waits before assuming the user is done.

Faster setting: less wait time, quicker responses.
Higher setting: gives slower speakers more room to finish.

If your audience tends to speak slowly, increase this setting slightly.

3. Interruption sensitivity

Interruption sensitivity controls how easily the assistant can be interrupted while speaking.

A common baseline is 0.50.
Outbound use cases: usually benefit from easier interruption.
Reception/front-desk use cases: often benefit from a bit more stability.

4. VAD sensitivity (Speech-to-Speech)

For speech-to-speech models, you also get VAD sensitivity. When enabled, an additional voice activity logic helps detect pauses and turn changes more naturally, similar to modern voice-chat behavior.

Recommended workflow

Start with defaults (speech detection, moderate slider values).
Test with realistic conversations.
Change only one setting per test run.
Iterate toward your specific use case.

Every assistant behaves a bit differently. Short iterative tests with real call scenarios give the best final settings.See also: Engine types, Interruptions, and Testing.

Synthesizer Knowledge Bases (RAG)

​1. Endpoint type

​2. Endpoint sensitivity

​3. Interruption sensitivity

​4. VAD sensitivity (Speech-to-Speech)

​Recommended workflow

1. Endpoint type

2. Endpoint sensitivity

3. Interruption sensitivity

4. VAD sensitivity (Speech-to-Speech)

Recommended workflow