Last updated: Aug. 29, 2025
1. Pick a Mode
| Mode | Why choose it? | Notes |
|---|---|---|
| Speech-to-Speech (Multimodal) | Fastest turn-taking and most natural flow | Recommended starting point. Try the Gemini 2.5 engine (beta) for the lowest latency, but note it’s experimental and may be less stable. |
| Pipeline | Maximum control over voice and long-form replies | If you select Pipeline, continue to the Transcriber step below. |
2. Choose a Transcriber (Pipeline only)
| Transcriber | Accuracy | Latency | Best for |
|---|---|---|---|
| Azure | ⭐️⭐️⭐️⭐️ | ⏱️⏱️⏱️ (slower) | Highest transcription fidelity |
| Gladia | ⭐️⭐️⭐️ | ⏱️ (faster) | Good all-rounder for most languages |
| Deepgram | ⭐️⭐️⭐️ | ⏱️ (faster) | Solid alternative—test which performs better for your language and audio setup |
3. Select an LLM Model
| Model | Strengths | Trade-offs |
|---|---|---|
| GPT-4o | Smartest reasoning, handles complex prompts | Slightly higher latency and cost |
| Gemini 2.5-Flash-Lite | Blazing-fast, still highly capable | May miss nuance in very complex tasks—test for your use case |
4. Noise Cancellation
If callers are on speaker phone or in a quiet environment, keep noise cancellation ON. If your call volume is low or some words are “clipped,” turn it OFF so the transcriber gets the full waveform.5. Conversation Timers
| Parameter | Recommended | Why |
|---|---|---|
| Re-engagement | ≈ 30 s | Gives callers enough time to think. Lower values can feel pushy. |
| Max silence duration | ≈ 60 s | Prevents premature hang-ups while still ending truly silent calls. |
Test different values in real calls—too low can interrupt, too high leaves awkward gaps.
6. Initial Message
| Mode | How it’s used | Best practice |
|---|---|---|
| Pipeline | Read exactly as written (converted by TTS) | Write the greeting verbatim: “Hello, this is Alex from …”. |
| Speech-to-Speech | Interpreted as a prompt by the model | Include instructions like “Greet the customer and say …” or prepend say exactly: to ensure literal output. |
7. Ambient sound
Ambient sound adds subtle background noise to the assistant’s voice and is enabled by default.8. Endpointing sliders
Control when your assistant starts talking with the endpointing sensitivity slider at the bottom of assistant settings.| Setting | Effect | Use when |
|---|---|---|
| Lower sensitivity | Assistant responds faster after caller stops speaking | You want snappy, quick-turn conversations |
| Higher sensitivity | Assistant waits longer before responding | Callers give longer, more detailed replies |
9. Debug using the call transcript
1
Open Call history
Go to the Call history page in your dashboard.
2
Select your latest test call
Open the most recent call you placed for this assistant.
3
Inspect transcript and function calls
Review the transcript, function calls, and parameters to identify timing or logic issues.
Confirm the assistant is using the expected mode, model, and tools per your configuration.

