Best Practices

This page collects practical recommendations for building effective voice AI agents on Voxagent. These tips are drawn from production experience and apply to most voice agent use cases.

Voice Settings

Stability and Speed

The TTS voice stability and speed settings have a significant impact on how natural your agent sounds.

Speed: Keep between 0.9x and 1.1x for natural conversation. Going faster can sound rushed; going slower can feel sluggish.
Stability: A value between 0.5 and 0.75 provides a good balance of expressiveness and consistency. Lower stability sounds more emotive but less predictable.

Test your voice settings by running a few conversations and listening back. Small adjustments make a big difference in perceived quality.

Background Sounds

Adding a subtle background sound (e.g., office ambience) can make the agent sound more realistic and reduce the "uncanny valley" effect of pure silence between turns.

Use background sounds sparingly -- they should be barely noticeable, not distracting.

Conversation Design

Turn Timeout

The turn timeout controls how long the agent waits after the caller stops speaking before it responds. The right value depends on your use case:

Use Case	Recommended Timeout	Rationale
Customer support	5 - 10 seconds	Callers may pause to think or look up information.
Data collection	10 - 15 seconds	Callers need time to find account numbers, addresses, etc.
Quick Q&A	3 - 5 seconds	Fast-paced interactions benefit from shorter timeouts.

Interruption Handling

In voice conversations, callers will sometimes speak while the agent is still talking. This is natural behavior. Configure your agent to handle interruptions gracefully:

Allow the agent to be interrupted during informational responses.
Use shorter sentences so the agent yields naturally.
In your system prompt, instruct the agent to acknowledge interruptions (e.g., "Sorry, go ahead.").

Silence Handling

Extended silence can confuse callers. Add guidance in your system prompt:

Less effective:

Wait for the user to respond.

Recommended:

If the user is silent for more than 10 seconds, gently prompt them:
"Are you still there? Take your time."
If they remain silent after two prompts, say:
"It seems like you may have stepped away. I'll end the call for now.
Feel free to call back anytime."

Model Selection

Choosing the right LLM model affects response quality, latency, and cost.

Model Type	Latency	Best For	Trade-offs
GPT-4o	Medium	General-purpose support, complex reasoning	Higher cost, slightly higher latency
GPT-4o Mini	Low	High-volume, straightforward tasks	Less capable with nuanced reasoning
Flash models	Very Low	Low-latency requirements, simple interactions	May struggle with multi-step reasoning
Claude (Sonnet/Haiku)	Medium/Low	Complex reasoning, detailed instructions, long prompts	Availability depends on configuration

Start with a general-purpose model like GPT-4o during development. Once your agent is stable, test with faster models to see if quality remains acceptable at lower latency and cost.

Prompt Engineering for Voice

Voice agents require different prompting strategies than text-based chatbots. See the dedicated Prompting Guide for full details. Key highlights:

Less effective:

You are a helpful assistant. Answer the user's questions thoroughly
and provide detailed explanations with examples.

Recommended:

You are a friendly phone support agent. Keep responses under 2-3
sentences. Speak naturally and conversationally. If a topic needs
a longer explanation, break it into steps and confirm understanding
after each step.

Multi-Agent Patterns

For complex workflows, use multiple agents connected by transfer rules.

Orchestrator + Specialists

A common pattern is to use one "front door" agent that routes callers to specialized agents:

Orchestrator Agent: Greets the caller, identifies their intent, and transfers to the right specialist.
Billing Agent: Handles billing inquiries, payment issues, and plan changes.
Technical Support Agent: Handles product questions, troubleshooting, and bug reports.
Scheduling Agent: Handles appointment booking and rescheduling.

Configure transfer rules on the orchestrator agent with clear conditions:

"Transfer to the billing agent when the caller mentions invoices, payments, charges, or subscription changes."
"Transfer to technical support when the caller describes a problem, error, or needs help using the product."

Keep each specialist agent's system prompt focused on its domain. This improves response quality compared to a single agent that tries to handle everything.

Pre-Tool Speech

Always configure pre-tool speech for tools that involve network calls. Silence during a phone call feels much longer than it actually is.

Less effective:

Agent goes silent for 3 seconds while calling the API.

Recommended:

Agent says: "Let me check that for you, one moment." API call happens while the caller waits with context.

Testing Checklist

All configured languages work correctly with appropriate first messages.
The agent handles interruptions without breaking flow.
Tools trigger at the right moments and return expected results.
Transfer rules route to the correct agents.
The agent gracefully handles tool errors (e.g., API timeout).
Silence and edge cases are handled (caller goes quiet, background noise).
The agent stays within its guardrails (does not make up information, does not go off-topic).

Best Practices

On this page