Skip to main content
With synthesizer settings, you can directly shape your assistant’s voice. You mainly work with three sliders:
  • Voice stability
  • Voice similarity
  • Speaking rate

1. Voice stability

Voice stability controls how monotone vs. expressive the voice sounds.
  • More to the right: steadier, more formal, more consistent.
  • More to the left: more emotional, friendlier, more dynamic.
A common starting point is around 0.30, but the right value depends on your use case.

2. Voice similarity

Voice similarity is a fine-tuning control for stability and closeness to the chosen base voice. If you want an emotional voice to keep that character more consistently, or stay closer to the original reference voice, move this slider further right.

3. Speaking rate

Speaking rate depends heavily on the selected voice.
  • 1.0 is often a good baseline.
  • Some voices are naturally slower or faster, so tune accordingly.
  1. Pick your voice first.
  2. Start with moderate default values.
  3. Change only one slider per test.
  4. Listen, compare, then continue.
This gets you to a stable voice setup faster for your real scenario.
Results are voice-specific. Test with realistic phrases from your real call flow, not only short demo sentences.Continue with: TTS providers, Voice selection, and Testing.