SpeakFlow Desktop

Desktop application that captures two audio sources in real-time (user's microphone + system audio) and streams them separately to a backend for transcription in virtual meetings.

Features

Dual audio capture — user's microphone + remote meeting audio (Teams, Meet, Zoom)
Low-latency streaming — 500ms chunks with binary PCM audio (not base64)
Automatic reconnection — 60s buffer with exponential backoff on backend failures
Real-time UI — VU meters per source + live transcripts from backend
Secure architecture — Electron sandbox, contextIsolation, no nodeIntegration

Architecture

flowchart TB
    subgraph Renderer["Renderer Process (React)"]
        Mic["getUserMedia<br/>(microphone)"]
        Sys["getDisplayMedia<br/>(WASAPI loopback)"]
        Worklet["AudioWorklet<br/>PCM 16-bit mono 16kHz<br/>500ms chunks"]
        
        Mic --> Worklet
        Sys --> Worklet
    end
    
    subgraph Preload["Preload (contextBridge)"]
        Bridge["IPC Bridge<br/>ArrayBuffer"]
    end
    
    subgraph Main["Main Process (Node.js)"]
        Buffer["Ring Buffer<br/>120 chunks ≈ 60s"]
        WS["WebSocket Client<br/>+ Heartbeat"]
        
        Buffer --> WS
    end
    
    Backend["Backend WebSocket<br/>(JSON metadata + binary PCM)"]
    
    Worklet -->|"ArrayBuffer"| Bridge
    Bridge -->|"IPC"| Buffer
    WS -->|"2 frames:<br/>1. JSON metadata<br/>2. Binary PCM"| Backend

Every 500ms, 2 WebSocket frames are sent:

JSON with metadata (session_id, source, timestamp, size)
Binary with PCM (~16 KB)

→ See full technical protocol

Tech Stack

Desktop App:

Electron 31
React 18 + TypeScript
Vite (dev server + HMR)
Zustand (state management)
WASAPI Loopback (system capture on Windows)

Tooling:

pnpm (package manager)
electron-vite (build)
electron-builder (packaging)
ESLint + TypeScript strict

Getting Started

Prerequisites

Node.js 20+
pnpm 9+
Windows 10/11 (for now; macOS/Linux on roadmap)

Installation

git clone https://github.com/your-org/speakflow-desktop.git
cd speakflow-desktop
pnpm install

Run

pnpm run dev

The Electron window will open with DevTools.

Configure

Click ⚙ Settings
Enter:
- User ID: your identifier
- Backend WebSocket URL: ws://localhost:8000/ws/audio (or your backend)
Save

Start Capturing

Select microphone (or leave "System default")
▶ Start Capture
Accept Windows permissions
You'll see:
- 🟢 Connected (if backend is running)
- VU meters MIC / SYS
- Real-time transcripts (if backend sends them)

Project Structure

speakflow-desktop/
├── src/
│   ├── main/              # Electron main process (Node.js)
│   │   ├── audio/         # WASAPI capture + permissions
│   │   ├── websocket/     # WS client + reconnection + buffer
│   │   ├── auth/          # CredentialsProvider (static/JWT)
│   │   ├── config/        # settings.json in userData
│   │   ├── ipc/           # IPC handlers
│   │   └── logging/       # structured electron-log
│   ├── preload/           # contextBridge (secure API)
│   ├── renderer/          # React UI
│   │   ├── audio/         # getUserMedia + AudioWorklet
│   │   ├── components/    # UI controls + VU + transcripts
│   │   └── store/         # Zustand store
│   └── shared/            # Shared types + constants
├── PROTOCOL.md            # WebSocket technical spec
└── electron-builder.yml   # Windows packaging config

Build & Package

Development Build

pnpm run build

Generates out/ with compiled bundles.

Production Installer

pnpm run package

Generates installer in dist/ (.exe for Windows).

Lessons Learned

Technical Challenges

1. System audio capture without custom drivers

Using desktopCapturer with audio: 'loopback' from Electron allows capturing WASAPI loopback on Windows without installing virtual drivers (VB-Audio Cable, etc.). Limitation: captures full system mix, not per-process.

2. Resampling to 16 kHz without custom DSP

new AudioContext({ sampleRate: 16000 }) delegates resampling to Chromium's libwebrtc. Quality sufficient for ASR with zero overhead of maintaining custom polyphase decimators.

3. Audio IPC without memory inflation

Sending ArrayBuffer over IPC (instead of base64 in JSON) reduces overhead. The ArrayBuffer is transferred via structured clone, not serialization.

4. Reconnection without losing audio

Bounded ring buffer (60s) + exponential backoff. On reconnect, drains queue in order. Drop-oldest if overflow (rare with 500ms chunks).

5. Electron sandbox without breaking capture

Capture must run in renderer (Web APIs: getUserMedia, AudioWorklet). Main only handles WebSocket + IPC. Preload exposes minimal API with contextBridge.

Design Decisions

Binary vs base64: saves ~25% bandwidth and is directly compatible with Deepgram/AssemblyAI.
500ms chunks: balance between latency (low) and network overhead (acceptable).
Two separate sources: enables diarization (user vs. remote) in backend without client-side VAD.
pnpm: faster resolution than npm, but requires --skipDepsCheck due to electron build scripts issue.

Contributing

# Fork + clone
git checkout -b feature/my-feature
# ... changes ...
pnpm run lint
pnpm run build
git commit -m "feat: description"
# Push + PR

License

Licensed under the PolyForm Noncommercial License 1.0.0. See LICENSE for the full text.

Noncommercial use only. Commercial use requires separate permission from the licensor.

Made with 🎵 and 🦜 (Chatot approves this audio capture)

Technical docs:

WebSocket Protocol — JSON + binary schema, backend examples

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
assets		assets
docs		docs
src		src
.env.example		.env.example
.eslintrc.json		.eslintrc.json
.gitignore		.gitignore
.npmrc		.npmrc
LICENSE		LICENSE
README.md		README.md
electron-builder.yml		electron-builder.yml
electron.vite.config.ts		electron.vite.config.ts
package.json		package.json
pnpm-lock.yaml		pnpm-lock.yaml
tsconfig.json		tsconfig.json
tsconfig.node.json		tsconfig.node.json
tsconfig.web.json		tsconfig.web.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SpeakFlow Desktop

Features

Architecture

Tech Stack

Getting Started

Prerequisites

Installation

Run

Configure

Start Capturing

Project Structure

Build & Package

Development Build

Production Installer

Lessons Learned

Technical Challenges

Design Decisions

Contributing

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

SpeakFlow Desktop

Features

Architecture

Tech Stack

Getting Started

Prerequisites

Installation

Run

Configure

Start Capturing

Project Structure

Build & Package

Development Build

Production Installer

Lessons Learned

Technical Challenges

Design Decisions

Contributing

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages