Desktop application that captures two audio sources in real-time (user's microphone + system audio) and streams them separately to a backend for transcription in virtual meetings.
- Dual audio capture — user's microphone + remote meeting audio (Teams, Meet, Zoom)
- Low-latency streaming — 500ms chunks with binary PCM audio (not base64)
- Automatic reconnection — 60s buffer with exponential backoff on backend failures
- Real-time UI — VU meters per source + live transcripts from backend
- Secure architecture — Electron sandbox,
contextIsolation, nonodeIntegration
flowchart TB
subgraph Renderer["Renderer Process (React)"]
Mic["getUserMedia<br/>(microphone)"]
Sys["getDisplayMedia<br/>(WASAPI loopback)"]
Worklet["AudioWorklet<br/>PCM 16-bit mono 16kHz<br/>500ms chunks"]
Mic --> Worklet
Sys --> Worklet
end
subgraph Preload["Preload (contextBridge)"]
Bridge["IPC Bridge<br/>ArrayBuffer"]
end
subgraph Main["Main Process (Node.js)"]
Buffer["Ring Buffer<br/>120 chunks ≈ 60s"]
WS["WebSocket Client<br/>+ Heartbeat"]
Buffer --> WS
end
Backend["Backend WebSocket<br/>(JSON metadata + binary PCM)"]
Worklet -->|"ArrayBuffer"| Bridge
Bridge -->|"IPC"| Buffer
WS -->|"2 frames:<br/>1. JSON metadata<br/>2. Binary PCM"| Backend
Every 500ms, 2 WebSocket frames are sent:
- JSON with metadata (
session_id,source,timestamp,size) - Binary with PCM (~16 KB)
Desktop App:
- Electron 31
- React 18 + TypeScript
- Vite (dev server + HMR)
- Zustand (state management)
- WASAPI Loopback (system capture on Windows)
Tooling:
- pnpm (package manager)
- electron-vite (build)
- electron-builder (packaging)
- ESLint + TypeScript strict
- Node.js 20+
- pnpm 9+
- Windows 10/11 (for now; macOS/Linux on roadmap)
git clone https://github.com/your-org/speakflow-desktop.git
cd speakflow-desktop
pnpm installpnpm run devThe Electron window will open with DevTools.
- Click ⚙ Settings
- Enter:
- User ID: your identifier
- Backend WebSocket URL:
ws://localhost:8000/ws/audio(or your backend)
- Save
- Select microphone (or leave "System default")
- ▶ Start Capture
- Accept Windows permissions
- You'll see:
- 🟢 Connected (if backend is running)
- VU meters MIC / SYS
- Real-time transcripts (if backend sends them)
speakflow-desktop/
├── src/
│ ├── main/ # Electron main process (Node.js)
│ │ ├── audio/ # WASAPI capture + permissions
│ │ ├── websocket/ # WS client + reconnection + buffer
│ │ ├── auth/ # CredentialsProvider (static/JWT)
│ │ ├── config/ # settings.json in userData
│ │ ├── ipc/ # IPC handlers
│ │ └── logging/ # structured electron-log
│ ├── preload/ # contextBridge (secure API)
│ ├── renderer/ # React UI
│ │ ├── audio/ # getUserMedia + AudioWorklet
│ │ ├── components/ # UI controls + VU + transcripts
│ │ └── store/ # Zustand store
│ └── shared/ # Shared types + constants
├── PROTOCOL.md # WebSocket technical spec
└── electron-builder.yml # Windows packaging config
pnpm run buildGenerates out/ with compiled bundles.
pnpm run packageGenerates installer in dist/ (.exe for Windows).
1. System audio capture without custom drivers
Using desktopCapturer with audio: 'loopback' from Electron allows capturing WASAPI loopback on Windows without installing virtual drivers (VB-Audio Cable, etc.). Limitation: captures full system mix, not per-process.
2. Resampling to 16 kHz without custom DSP
new AudioContext({ sampleRate: 16000 }) delegates resampling to Chromium's libwebrtc. Quality sufficient for ASR with zero overhead of maintaining custom polyphase decimators.
3. Audio IPC without memory inflation
Sending ArrayBuffer over IPC (instead of base64 in JSON) reduces overhead. The ArrayBuffer is transferred via structured clone, not serialization.
4. Reconnection without losing audio
Bounded ring buffer (60s) + exponential backoff. On reconnect, drains queue in order. Drop-oldest if overflow (rare with 500ms chunks).
5. Electron sandbox without breaking capture
Capture must run in renderer (Web APIs: getUserMedia, AudioWorklet). Main only handles WebSocket + IPC. Preload exposes minimal API with contextBridge.
- Binary vs base64: saves ~25% bandwidth and is directly compatible with Deepgram/AssemblyAI.
- 500ms chunks: balance between latency (low) and network overhead (acceptable).
- Two separate sources: enables diarization (user vs. remote) in backend without client-side VAD.
- pnpm: faster resolution than npm, but requires
--skipDepsCheckdue to electron build scripts issue.
# Fork + clone
git checkout -b feature/my-feature
# ... changes ...
pnpm run lint
pnpm run build
git commit -m "feat: description"
# Push + PRCopyright (c) 2026 Laura Sot
Licensed under the PolyForm Noncommercial License 1.0.0. See LICENSE for the full text.
Noncommercial use only. Commercial use requires separate permission from the licensor.
Made with 🎵 and 🦜 (Chatot approves this audio capture)

Technical docs:
- WebSocket Protocol — JSON + binary schema, backend examples
