Quick guide to fine-tune mode, transcriber, model, and other settings for the best call experience
Mode | Why choose it? | Notes |
---|---|---|
Speech-to-Speech (Multimodal) | Fastest turn-taking and most natural flow | Recommended starting point. Try the Gemini 2.5 engine (beta) for the lowest latency, but note it’s experimental and may be less stable. |
Pipeline | Maximum control over voice and long-form replies | If you select Pipeline, continue to the Transcriber step below. |
Transcriber | Accuracy | Latency | Best for |
---|---|---|---|
Azure | ⭐️⭐️⭐️⭐️ | ⏱️⏱️⏱️ (slower) | Highest transcription fidelity |
Gladia | ⭐️⭐️⭐️ | ⏱️ (faster) | Good all-rounder for most languages |
Deepgram | ⭐️⭐️⭐️ | ⏱️ (faster) | Solid alternative—test which performs better for your language and audio setup |
Model | Strengths | Trade-offs |
---|---|---|
GPT-4o | Smartest reasoning, handles complex prompts | Slightly higher latency and cost |
Gemini 2.5-Flash-Lite | Blazing-fast, still highly capable | May miss nuance in very complex tasks—test for your use case |
Parameter | Recommended | Why |
---|---|---|
Re-engagement | ≈ 30 s | Gives callers enough time to think. Lower values can feel pushy. |
Max silence duration | ≈ 60 s | Prevents premature hang-ups while still ending truly silent calls. |
Mode | How it’s used | Best practice |
---|---|---|
Pipeline | Read exactly as written (converted by TTS) | Write the greeting verbatim: “Hello, this is Alex from …”. |
Speech-to-Speech | Interpreted as a prompt by the model | Include instructions like “Greet the customer and say …” or prepend say exactly: to ensure literal output. |
Setting | Effect | Use when |
---|---|---|
Lower sensitivity | Assistant responds faster after caller stops speaking | You want snappy, quick-turn conversations |
Higher sensitivity | Assistant waits longer before responding | Callers give longer, more detailed replies |
Open Call history
Select your latest test call
Inspect transcript and function calls