Last updated: Aug. 29, 2025
1. Pick a Mode
Mode | Why choose it? | Notes |
---|---|---|
Speech-to-Speech (Multimodal) | Fastest turn-taking and most natural flow | Recommended starting point. Try the Gemini 2.5 engine (beta) for the lowest latency, but note it’s experimental and may be less stable. |
Pipeline | Maximum control over voice and long-form replies | If you select Pipeline, continue to the Transcriber step below. |
Record the same scenario in both modes and compare response time and caller satisfaction.
2. Choose a Transcriber (Pipeline only)
Transcriber | Accuracy | Latency | Best for |
---|---|---|---|
Azure | ⭐️⭐️⭐️⭐️ | ⏱️⏱️⏱️ (slower) | Highest transcription fidelity |
Gladia | ⭐️⭐️⭐️ | ⏱️ (faster) | Good all-rounder for most languages |
Deepgram | ⭐️⭐️⭐️ | ⏱️ (faster) | Solid alternative—test which performs better for your language and audio setup |
Different languages, accents, and background noise can affect each engine differently. Run a quick A/B test and keep the best performer.
3. Select an LLM Model
Model | Strengths | Trade-offs |
---|---|---|
GPT-4o | Smartest reasoning, handles complex prompts | Slightly higher latency and cost |
Gemini 2.5-Flash-Lite | Blazing-fast, still highly capable | May miss nuance in very complex tasks—test for your use case |
If speed is critical, start with Gemini 2.5-Flash-Lite. For sophisticated reasoning, use GPT-4o and offset latency by shortening replies.
4. Noise Cancellation
If callers are on speaker phone or in a quiet environment, keep noise cancellation ON. If your call volume is low or some words are “clipped,” turn it OFF so the transcriber gets the full waveform.If the assistant isn’t hearing you well, try turning noise cancellation off.
5. Conversation Timers
Parameter | Recommended | Why |
---|---|---|
Re-engagement | ≈ 30 s | Gives callers enough time to think. Lower values can feel pushy. |
Max silence duration | ≈ 60 s | Prevents premature hang-ups while still ending truly silent calls. |
Test different values in real calls—too low can interrupt, too high leaves awkward gaps.
6. Initial Message
Mode | How it’s used | Best practice |
---|---|---|
Pipeline | Read exactly as written (converted by TTS) | Write the greeting verbatim: “Hello, this is Alex from …”. |
Speech-to-Speech | Interpreted as a prompt by the model | Include instructions like “Greet the customer and say …” or prepend say exactly: to ensure literal output. |
7. Ambient sound
Ambient sound adds subtle background noise to the assistant’s voice and is enabled by default.If the assistant isn’t hearing you well, turn off ambient sound or lower the ambient volume.
8. Endpointing sliders
Control when your assistant starts talking with the endpointing sensitivity slider at the bottom of assistant settings.Setting | Effect | Use when |
---|---|---|
Lower sensitivity | Assistant responds faster after caller stops speaking | You want snappy, quick-turn conversations |
Higher sensitivity | Assistant waits longer before responding | Callers give longer, more detailed replies |
If your assistant cuts off callers mid-sentence, increase sensitivity. If responses feel sluggish, decrease it.
9. Debug using the call transcript
1
Open Call history
Go to the Call history page in your dashboard.
2
Select your latest test call
Open the most recent call you placed for this assistant.
3
Inspect transcript and function calls
Review the transcript, function calls, and parameters to identify timing or logic issues.
Confirm the assistant is using the expected mode, model, and tools per your configuration.