Overview
Cloudflare's @cloudflare/voice package (part of the Agents platform, currently Beta) provides a complete voice pipeline — STT, LLM turn handling, TTS, and conversation persistence — built on Durable Objects. Evaluate how this fits as an optional voice capability for aegis-oss agents.
Docs: https://developers.cloudflare.com/agents/api-reference/voice/
What the API provides
// Server-side: extend a DO with voice
export class MyAgent extends withVoice(AgentBase) {
transcriber = new WorkersAIFluxSTT(env.AI);
tts = new WorkersAITTS(env.AI);
async onTurn(transcript: string, ctx: TurnContext) {
// plug into aegis-oss memory + dispatch here
return this.kernel.dispatch(transcript);
}
}
// Client-side: React hook
const { status, transcript, startCall, endCall } = useVoiceAgent(agentUrl);
Key capabilities:
- Binary PCM audio over WebSocket, sentence-chunked streaming TTS
- Interruption handling (user speech aborts in-flight LLM/TTS)
- SQLite-backed conversation history via DO storage
- Hooks:
afterTranscribe, beforeSynthesize, onInterrupt, lifecycle
- Workers AI STT/TTS (free tier) or Deepgram/ElevenLabs/Twilio third-party
Design questions for aegis-oss
Opportunity
The CF voice API aligns closely with aegis-oss's DO-native architecture. Shipping voice as a first-class optional capability would differentiate aegis-oss from other agent frameworks that require external voice infra.
Related
Overview
Cloudflare's
@cloudflare/voicepackage (part of the Agents platform, currently Beta) provides a complete voice pipeline — STT, LLM turn handling, TTS, and conversation persistence — built on Durable Objects. Evaluate how this fits as an optional voice capability for aegis-oss agents.Docs: https://developers.cloudflare.com/agents/api-reference/voice/
What the API provides
Key capabilities:
afterTranscribe,beforeSynthesize,onInterrupt, lifecycleDesign questions for aegis-oss
withVoicewrap the agent's DO class, or should aegis-oss expose aVoiceCapabilitymixin that delegates to CF's mixin internally?afterTranscribehook is the natural injection point for aegis-oss memory context. How do we surface this cleanly?onTurn→ kernel dispatch:onTurnreturns a string/stream. Does this map 1:1 to the aegis-oss executor interface, or do we need an adapter?@aegis-oss/voiceaddon package?VoiceClient(non-React) may matter for agents used outside browser contextsOpportunity
The CF voice API aligns closely with aegis-oss's DO-native architecture. Shipping voice as a first-class optional capability would differentiate aegis-oss from other agent frameworks that require external voice infra.
Related