Voice & Communication

Voxagent uses WebRTC for real-time voice communication between users and AI agents.

How It Works

User speaks → WebRTC audio → STT (speech-to-text) →
LLM processes text → generates response →
TTS (text-to-speech) → WebRTC audio → User hears

Converts user speech to text for the LLM. STT provider is configurable per agent.

Processes the conversation and generates responses. LLM provider and model are configurable per agent (direct vendor API or aggregator).

Converts agent responses to speech. TTS provider is configurable per agent.

Detects when the user starts and stops speaking. Controls turn-taking behavior — when the agent should start or stop talking.

Every conversation is automatically recorded via LiveKit Egress. Recordings include:

The platform uses SignalR to notify the frontend about session events:

Status	Description
Pending	Session created, room not yet ready
Dispatching	LiveKit room created, waiting for agent to join
Active	Agent joined, conversation in progress
Completed	Conversation ended normally
Failed	Error occurred during session