1. Endpoint type
With endpoint type, you choose how end-of-utterance is detected:- Speech detection (default): usually faster, strong for low latency.
- AI detection: often better for longer user turns with thinking pauses.
2. Endpoint sensitivity
Endpoint sensitivity controls how long the assistant waits before assuming the user is done.- Faster setting: less wait time, quicker responses.
- Higher setting: gives slower speakers more room to finish.
3. Interruption sensitivity
Interruption sensitivity controls how easily the assistant can be interrupted while speaking.- A common baseline is 0.50.
- Outbound use cases: usually benefit from easier interruption.
- Reception/front-desk use cases: often benefit from a bit more stability.
4. VAD sensitivity (Speech-to-Speech)
For speech-to-speech models, you also get VAD sensitivity. When enabled, an additional voice activity logic helps detect pauses and turn changes more naturally, similar to modern voice-chat behavior.Recommended workflow
- Start with defaults (speech detection, moderate slider values).
- Test with realistic conversations.
- Change only one setting per test run.
- Iterate toward your specific use case.

