Research: Cloudflare Agents voice API as native voice layer for aegis-oss agents

## Overview

Cloudflare's `@cloudflare/voice` package (part of the Agents platform, currently Beta) provides a complete voice pipeline — STT, LLM turn handling, TTS, and conversation persistence — built on Durable Objects. Evaluate how this fits as an optional voice capability for aegis-oss agents.

Docs: https://developers.cloudflare.com/agents/api-reference/voice/

---

## What the API provides

```ts
// Server-side: extend a DO with voice
export class MyAgent extends withVoice(AgentBase) {
  transcriber = new WorkersAIFluxSTT(env.AI);
  tts = new WorkersAITTS(env.AI);

  async onTurn(transcript: string, ctx: TurnContext) {
    // plug into aegis-oss memory + dispatch here
    return this.kernel.dispatch(transcript);
  }
}

// Client-side: React hook
const { status, transcript, startCall, endCall } = useVoiceAgent(agentUrl);
```

Key capabilities:
- Binary PCM audio over WebSocket, sentence-chunked streaming TTS
- Interruption handling (user speech aborts in-flight LLM/TTS)
- SQLite-backed conversation history via DO storage
- Hooks: `afterTranscribe`, `beforeSynthesize`, `onInterrupt`, lifecycle
- Workers AI STT/TTS (free tier) or Deepgram/ElevenLabs/Twilio third-party

---

## Design questions for aegis-oss

- [ ] **Composition model**: Should `withVoice` wrap the agent's DO class, or should aegis-oss expose a `VoiceCapability` mixin that delegates to CF's mixin internally?
- [ ] **Memory integration**: The `afterTranscribe` hook is the natural injection point for aegis-oss memory context. How do we surface this cleanly?
- [ ] **`onTurn` → kernel dispatch**: `onTurn` returns a string/stream. Does this map 1:1 to the aegis-oss executor interface, or do we need an adapter?
- [ ] **Packaging**: Core bundle vs. optional `@aegis-oss/voice` addon package?
- [ ] **Provider abstraction**: Should aegis-oss abstract over CF's STT/TTS providers, or pass through CF's interface directly?
- [ ] **Framework-agnostic client**: `VoiceClient` (non-React) may matter for agents used outside browser contexts

---

## Opportunity

The CF voice API aligns closely with aegis-oss's DO-native architecture. Shipping voice as a first-class optional capability would differentiate aegis-oss from other agent frameworks that require external voice infra.

---

## Related

- Internal AEGIS daemon integration: Stackbilt-dev/aegis#590
- CF Agents platform: https://developers.cloudflare.com/agents/

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Research: Cloudflare Agents voice API as native voice layer for aegis-oss agents #39

Overview

What the API provides

Design questions for aegis-oss

Opportunity

Related

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Research: Cloudflare Agents voice API as native voice layer for aegis-oss agents #39

Description

Overview

What the API provides

Design questions for aegis-oss

Opportunity

Related

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions