Assistant Modes

Famulor AI assistants support two voice generation modes. Each mode affects how caller speech is processed and how replies are generated.

Latency varies by language, model, and network conditions. Values below are typical ranges.

Quick comparison

Mode	How it works	Typical latency	Best for	Voice options
Pipeline	Speech-to-Text → LLM → Text-to-Speech	~800–1500 ms	Complex reasoning, dynamic prompts, multi-sentence replies	All library voices, including custom-cloned
Speech-to-Speech (Multimodal)	Direct speech-to-speech generation (no intermediate text)	~300–600 ms	Snappy back-and-forth, short and reactive replies	Limited set; expanding over time

1. Pipeline

Label in UI: Pipeline
How it works: Speech-to-Text → LLM → Text-to-Speech
Latency: ~800 – 1500 ms (depends on language & model)
Best for: Complex reasoning, dynamic prompts, multi-sentence replies

Pipeline mode first transcribes the caller’s words into text, runs that text through the language model, then converts the response back to audio. It’s a flexible approach that offers maximum control:

Supports all voices in the library (including custom-cloned voices).
Handles long-form answers or paragraph-style responses well.
Allows the LLM to inject variables and reference earlier context cleanly.

When to choose Pipeline

You need rich, multi-sentence answers (e.g., support queries, detailed explanations).
The assistant must reason over structured data or complex prompts.
You prefer absolute control of the spoken voice (clone or brand voice).

Use Pipeline when brand voice consistency matters or when your prompts rely on structured data lookups.

2. Speech-to-Speech (Multimodal)

Label in UI: Speech-to-speech
How it works: Direct speech-to-speech generation (no intermediate text)
Latency: ~300 – 600 ms (ultra low)
Best for: Natural back-and-forth, short & reactive replies

Speech-to-speech mode skips separate transcription and TTS. Instead, it uses a multimodal model that listens and speaks directly, producing more conversational flow:

Fast turn-taking – callers experience near-instant responses.
Generates more expressive prosody natively (intonation, fillers).
Currently supports a limited voice set, but more are added regularly.

When to choose Speech-to-Speech

The conversation needs to feel snappy (sales, booking confirmations).
Your replies are generally short sentences or quick acknowledgements.
You’re okay with the system-provided voice options for faster interaction.

If you need a custom-cloned voice, long-form responses, or advanced prompt logic, prefer Pipeline.

Switching modes

Test both modes and pick the best balance of speed and quality for your use case.

Open assistant settings

Go to Assistant → Settings → Voice Engine for the specific assistant.

Select a mode

Choose Pipeline or Speech-to-speech based on conversation style and latency needs.

Choose a voice (if using Pipeline)

Select a built-in voice or a custom-cloned voice. See: Voice Selection & Voice Cloning.

Place a quick test call

Record two short calls—one per mode—covering your most common scenarios.

Confirm acceptable latency, turn-taking, and tone consistency.

Decide and roll out

Pick the mode that best fits your flow and keep monitoring call recordings for quality.

Record two calls—one in each mode—and compare perceived latency and engagement to decide what fits your flow.

Voice Selection & Voice Cloning

Choose a built-in voice or use your custom-cloned brand voice.

System Prompts

Design effective prompts to guide reasoning and responses.

Updates

Introduction

Getting Started

AI Assistants Overview

Phone Numbers

Inbound Calls

Outbound Calls

AI Prompting & Conversation Design

Automation & Integrations

Costs & Pricing

Number Provisioning

Troubleshooting & FAQs

Assistant Modes

Quick comparison

1. Pipeline

When to choose Pipeline

2. Speech-to-Speech (Multimodal)

When to choose Speech-to-Speech

Switching modes

Voice Selection & Voice Cloning

System Prompts

Updates

Introduction

Getting Started

AI Assistants Overview

Phone Numbers

Inbound Calls

Outbound Calls

AI Prompting & Conversation Design

Automation & Integrations

Costs & Pricing

Number Provisioning

Troubleshooting & FAQs

​Quick comparison

​1. Pipeline

​When to choose Pipeline

​2. Speech-to-Speech (Multimodal)

​When to choose Speech-to-Speech

​Switching modes

Voice Selection & Voice Cloning

System Prompts

Quick comparison

1. Pipeline

When to choose Pipeline

2. Speech-to-Speech (Multimodal)

When to choose Speech-to-Speech

Switching modes