Documentation Index
Fetch the complete documentation index at: https://docs.cedarcopilot.com/llms.txt
Use this file to discover all available pages before exploring further.
Cedar’s voice system sends audio data to your backend for processing and expects either audio responses or structured JSON responses. This page covers how to implement the backend endpoint to handle voice interactions.
Endpoint Configuration
Automatic Configuration with Mastra
When using the Mastra provider, you can automatically configure the voice endpoint through the provider configuration:
<CedarCopilot
llmProvider={{
provider: 'mastra',
baseURL: 'http://localhost:3000/api',
voiceRoute: '/chat/voice-execute', // Automatically sets voice endpoint to: http://localhost:3000/api/chat/voice-execute
}}>
<YourApp />
</CedarCopilot>
This eliminates the need to manually call setVoiceEndpoint() and ensures consistency between your chat and voice endpoints.
Manual Configuration
For other providers or custom setups, configure the endpoint manually:
const voice = useCedarStore((state) => state.voice);
voice.setVoiceEndpoint('https://your-backend.com/api/voice');
The voice system sends a multipart/form-data POST request to your configured endpoint with the following fields:
FormData {
audio: Blob, // WebM format with Opus codec
settings: string, // JSON string of voice settings
context: string, // JSON string of additional context
// Additional dynamic parameters may be included
// based on the specific voice request configuration
// These params extend BaseParams that are sent to regular chat endpoints
}
Audio Data
- Format: WebM container with Opus codec
- Type: Binary blob from MediaRecorder API
- Quality: Optimized for speech recognition
Voice Settings
{
language: string; // e.g., 'en-US'
voiceId?: string; // Optional voice ID for TTS
pitch?: number; // 0.5 to 2.0
rate?: number; // 0.5 to 2.0
volume?: number; // 0.0 to 1.0
useBrowserTTS?: boolean;
autoAddToMessages?: boolean;
}
Additional Context
The context includes Cedar’s additional context data (file contents, state information, etc.) that can be used to provide better responses.
Your backend can respond in several ways:
1. Direct Audio Response
Return audio data directly with the appropriate content type:
// Response headers
Content-Type: audio/mpeg
// or audio/wav, audio/ogg, etc.
// Response body: Raw audio data
2. JSON Response with Audio URL
{
"audioUrl": "https://example.com/response.mp3",
"text": "Optional text transcript",
"transcription": "What the user said"
}
3. JSON Response with Base64 Audio
{
"audioData": "base64-encoded-audio-data",
"audioFormat": "mp3", // or "wav", "ogg"
"text": "Response text",
"transcription": "User input transcription"
}
4. Structured Response with Actions
{
"transcription": "Show me the user dashboard",
"text": "I'll show you the user dashboard",
"object": {
"type": "setState",
"stateKey": "ui",
"setterKey": "navigateTo",
"args": ["/dashboard"]
},
"audioUrl": "https://example.com/response.mp3"
}
Implementation Examples
Node.js with Express
import express from 'express';
import multer from 'multer';
import OpenAI from 'openai';
const app = express();
const upload = multer();
const openai = new OpenAI();
app.post(
'/api/chat/voice',
upload.fields([
{ name: 'audio', maxCount: 1 },
{ name: 'settings', maxCount: 1 },
{ name: 'context', maxCount: 1 },
]),
async (req, res) => {
try {
const audioFile = req.files.audio[0];
const settings = JSON.parse(req.body.settings);
const context = JSON.parse(req.body.context);
// 1. Transcribe audio to text
const transcription = await openai.audio.transcriptions.create({
file: new File([audioFile.buffer], 'audio.webm', {
type: 'audio/webm',
}),
model: 'whisper-1',
language: settings.language?.split('-')[0] || 'en',
});
// 2. Process with your AI agent
const messages = [
{ role: 'system', content: 'You are a helpful assistant.' },
{ role: 'user', content: transcription.text },
];
const completion = await openai.chat.completions.create({
model: 'gpt-4',
messages: messages,
});
const responseText = completion.choices[0].message.content;
// 3. Convert response to speech
const speech = await openai.audio.speech.create({
model: 'tts-1',
voice: settings.voiceId || 'alloy',
input: responseText,
speed: settings.rate || 1.0,
});
const audioBuffer = Buffer.from(await speech.arrayBuffer());
// 4. Return audio response
res.set({
'Content-Type': 'audio/mpeg',
'Content-Length': audioBuffer.length,
});
res.send(audioBuffer);
} catch (error) {
console.error('Voice processing error:', error);
res.status(500).json({ error: 'Failed to process voice request' });
}
}
);
Python with FastAPI
from fastapi import FastAPI, File, Form, UploadFile
from fastapi.responses import Response
import openai
import json
import io
app = FastAPI()
@app.post("/api/chat/voice")
async def handle_voice(
audio: UploadFile = File(...),
settings: str = Form(...),
context: str = Form(...)
):
try:
# Parse settings and context
voice_settings = json.loads(settings)
additional_context = json.loads(context)
# Read audio data
audio_data = await audio.read()
# 1. Transcribe audio
audio_file = io.BytesIO(audio_data)
audio_file.name = "audio.webm"
transcription = openai.Audio.transcribe(
model="whisper-1",
file=audio_file,
language=voice_settings.get("language", "en")[:2]
)
# 2. Process with AI
response = openai.ChatCompletion.create(
model="gpt-4",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": transcription["text"]}
]
)
response_text = response.choices[0].message.content
# 3. Generate speech
speech_response = openai.Audio.speech.create(
model="tts-1",
voice=voice_settings.get("voiceId", "alloy"),
input=response_text,
speed=voice_settings.get("rate", 1.0)
)
# 4. Return audio
return Response(
content=speech_response.content,
media_type="audio/mpeg"
)
except Exception as e:
return {"error": str(e)}, 500
Mastra Agent Integration
When using Cedar-OS with the Mastra provider and voiceRoute configuration, your Mastra backend should handle requests at the specified route. Here’s how to implement the voice handler:
import { Agent } from '@mastra/core';
import { openai } from '@mastra/openai';
const agent = new Agent({
name: 'voice-assistant',
instructions: 'You are a helpful voice assistant.',
model: openai.gpt4o(),
});
export async function handleVoiceRequest(
audioBlob: Blob,
settings: VoiceSettings,
context: string
) {
// 1. Transcribe audio
const transcription = await openai.audio.transcriptions.create({
file: audioBlob,
model: 'whisper-1',
language: settings.language?.split('-')[0] || 'en',
});
// 2. Add context to the conversation
const contextData = JSON.parse(context);
const systemMessage = `Additional context: ${JSON.stringify(contextData)}`;
// 3. Generate response with agent
const response = await agent.generate([
{ role: 'system', content: systemMessage },
{ role: 'user', content: transcription.text },
]);
// 4. Convert to speech
const speech = await openai.audio.speech.create({
model: 'tts-1',
voice: settings.voiceId || 'alloy',
input: response.text,
speed: settings.rate || 1.0,
});
return {
transcription: transcription.text,
text: response.text,
audioData: await speech.arrayBuffer(),
audioFormat: 'mp3',
};
}
// Example Mastra route setup
// If you configured voiceRoute: '/chat/voice-execute' in Cedar-OS,
// your Mastra backend should handle POST requests to this route:
app.post(
'/chat/voice-execute',
upload.fields([
{ name: 'audio', maxCount: 1 },
{ name: 'settings', maxCount: 1 },
{ name: 'context', maxCount: 1 },
]),
async (req, res) => {
const audioFile = req.files.audio[0];
const settings = JSON.parse(req.body.settings);
const context = req.body.context;
const result = await handleVoiceRequest(
new Blob([audioFile.buffer]),
settings,
context
);
// Return the structured response
res.json(result);
}
);
Error Handling
Your backend should handle various error cases:
app.post('/api/chat/voice', async (req, res) => {
try {
// ... processing logic
} catch (error) {
console.error('Voice processing error:', error);
// Return appropriate error response
if (error.code === 'AUDIO_TRANSCRIPTION_FAILED') {
return res.status(400).json({
error: 'Could not understand the audio. Please try again.',
code: 'TRANSCRIPTION_ERROR',
});
}
if (error.code === 'TTS_FAILED') {
return res.status(500).json({
error: 'Failed to generate speech response.',
code: 'TTS_ERROR',
// Fallback to text-only response
text: responseText,
transcription: userInput,
});
}
// Generic error
res.status(500).json({
error: 'Internal server error processing voice request',
code: 'INTERNAL_ERROR',
});
}
});
CORS Configuration
Ensure your backend allows CORS for the frontend domain:
import cors from 'cors';
app.use(
cors({
origin: ['http://localhost:3000', 'https://your-app-domain.com'],
methods: ['POST'],
allowedHeaders: ['Content-Type', 'Authorization'],
})
);
Audio Processing
- Use streaming transcription for real-time responses
- Implement audio compression to reduce bandwidth
- Cache TTS responses for common phrases
Response Optimization
- Stream audio responses when possible
- Use CDN for serving generated audio files
- Implement request queuing for high-traffic scenarios
// Example streaming response
app.post('/api/chat/voice-stream', async (req, res) => {
res.writeHead(200, {
'Content-Type': 'audio/mpeg',
'Transfer-Encoding': 'chunked',
});
const audioStream = await generateAudioStream(transcription);
audioStream.on('data', (chunk) => {
res.write(chunk);
});
audioStream.on('end', () => {
res.end();
});
});
Security Best Practices
- Rate Limiting: Prevent abuse of voice endpoints
- Authentication: Verify user permissions
- Input Validation: Sanitize audio data and settings
- Content Filtering: Screen transcriptions for inappropriate content
import rateLimit from 'express-rate-limit';
const voiceRateLimit = rateLimit({
windowMs: 15 * 60 * 1000, // 15 minutes
max: 50, // 50 requests per window
message: 'Too many voice requests, please try again later.',
});
app.post('/api/chat/voice', voiceRateLimit, handleVoiceRequest);
Testing Your Integration
Test your voice endpoint with curl:
# Create test audio file
curl -X POST http://localhost:3456/api/chat/voice \
-F "audio=@test-audio.webm" \
-F "settings={\"language\":\"en-US\",\"rate\":1.0}" \
-F "context={\"files\":[],\"state\":{}}" \
-o response.mp3
Or use the Cedar voice system directly for end-to-end testing:
// With Mastra provider (automatic configuration)
<CedarCopilot
llmProvider={{
provider: 'mastra',
baseURL: 'http://localhost:3456/api',
voiceRoute: '/chat/voice-execute',
}}>
<YourApp />
</CedarCopilot>;
// Or manually configure the endpoint
const voice = useCedarStore((state) => state.voice);
voice.setVoiceEndpoint('http://localhost:3456/api/chat/voice');
await voice.requestVoicePermission();
voice.startListening();