WhisKey

Work in progress. Core pipeline works — audio capture, ASR, text injection — but the app is not feature-complete or polished. Expect rough edges, missing UI, and breaking changes.

Local-first macOS voice transcription. Hold a hotkey, speak, release — transcribed text appears in whatever app has focus. No cloud, no subscription, no data leaves the machine.

Requirements: macOS 14+, Xcode 16, XcodeGen

Features

Core

Hotkey-driven transcription — Press a key, speak, release — text appears in the focused window
Offline ASR — Whisper.cpp inference (no cloud API calls)
LLM cleanup — Optional post-processing with phi-3.5-mini or llama-3.2-1b
Three injection modes — AccessibilityAPI, Pasteboard, or keystroke fallback

Sprint 3

Network Activity Monitor — Live egress audit log with timestamp, URL, and event type; menu bar dot badge (green/orange)
Hold-vs-Tap state machine — Right Option >300ms = Push-to-Talk; double-tap within 300ms = Hands-Free toggle (configurable 250–500ms)
Voice Snippets — Trigger-phrase text expansion with security blocks (password managers, terminals)
Multi-hotkey bindings — Four configurable hotkey actions with live key capture and conflict detection
Transcription history — SQLite database with raw transcript, cleaned text, source app, timestamp

How It Works

Pipeline

Right Option (hold) ──► AudioCaptureService
                               │ AVAudioEngine tap → resample to 16 kHz mono Float32
                               ▼
Right Option (release) ──► WhisperBridge
                               │ whisper.cpp inference (CGGML + CWhisper)
                               ▼
                         LlamaCppProvider  (optional)
                               │ llama.cpp cleanup pass (CGGML + CLlama)
                               ▼
                         TextInjector
                               │ 1. AX direct write (no clipboard clobber)
                               │ 2. Pasteboard + simulated Cmd-V (restores prior clipboard)
                               │ 3. CGEvent keyboard simulation (character-by-character fallback)
                               ▼
                         Focused window receives text

Module Layout

Module	Language	Role
`CGGML`	C/C++	Shared ggml runtime — Metal, Accelerate, CPU kernels. Linked once to avoid duplicate symbol errors.
`CWhisper`	C/C++	whisper.cpp ASR — depends on CGGML.
`CLlama`	C/C++	llama.cpp LLM inference — depends on CGGML.
`WhisKeyCore`	Swift	Business logic: pipeline, audio capture, ASR bridge, LLM provider, injection, history, settings.
`WhisKeyUI`	Swift/SwiftUI	Menu bar, floating HUD, settings, model picker.
`WhisKeyApp`	Swift	Entry point — wires pipeline, hotkey, and UI.

Key Components

AudioCaptureService — installs an AVAudioEngine tap on the input node, converts native device format to 16 kHz / mono / Float32 using AVAudioConverter, accumulates samples in a lock-protected buffer, and publishes normalized RMS for the HUD waveform.

WhisperBridge — Swift actor wrapping the CWhisper C bridge. Loads the GGML model lazily from ~/Library/Application Support/WhisKey/Models/. Runs inference on a DispatchQueue.global continuation with a 30-second timeout task racing the inference task.

LlamaCppProvider — Swift actor wrapping the CLlama C bridge. Loads a GGUF model lazily from the same Models directory. Builds a cleanup prompt from CleanupProfile (filler removal, punctuation, tone style) and calls llama_bridge_complete. Silently passes through raw transcript if the model file is absent.

TextInjector — Tries three injection strategies in order: (1) AXInjector sets kAXValueAttribute directly on the captured AXUIElement — no clipboard involved; (2) PasteboardInjector writes to NSPasteboard, posts Cmd-V, then restores prior contents; (3) CGEventInjector synthesizes individual keystrokes for apps that reject both.

HotkeyManager — CGEventTap at .cgSessionEventTap. Default hotkey: Right Option (0x3D). Supports push-to-talk (hold/release) and toggle (press/press) modes.

TranscriptionPipeline — Async orchestrator. Snapshots the focused AXUIElement immediately on hotkey release (before transcription latency causes focus to shift), runs ASR, optionally runs LLM cleanup, dispatches output, and persists history to SQLite via GRDB.

Output Modes

Mode	Behavior
`activeWindow`	Inject into focused app
`clipboard`	Write to clipboard only
`both`	Inject + write to clipboard

LLM Cleanup

When llmEnabled is on, the pipeline applies a CleanupProfile post-pass:

Remove fillers — strips "um", "uh", "like", "you know", etc.
Add punctuation — capitalizes sentences, adds periods/commas
Tone style — casual, formal, literal (passthrough), or context-inferred from the active app bundle ID
Raw mode — bypass LLM entirely regardless of other settings

Default model: phi-3.5-mini-q4_k_m.gguf. Falls back to passthrough if the GGUF file is absent.

Building

1. Install tools

brew install xcodegen

2. Clone and generate the Xcode project

git clone <repo-url>
cd whiskey
xcodegen generate

3. Download models

# Whisper ASR (required)
bash Scripts/download-models.sh tiny      # ~75 MB — default
bash Scripts/download-models.sh base      # ~142 MB
bash Scripts/download-models.sh small     # ~466 MB
bash Scripts/download-models.sh medium    # ~1.5 GB

# LLM cleanup (optional — app works without it)
bash Scripts/download-models.sh phi-3.5-mini   # ~2.4 GB — recommended
bash Scripts/download-models.sh llama-3.2-1b   # ~0.8 GB — fastest

Models are saved to ~/Library/Application Support/WhisKey/Models/.

4. Build and run

Open WhisKey.xcodeproj in Xcode, select the WhisKey scheme, and press Cmd-R.

The post-build script auto-installs the app to ~/Applications/WhisKey.app.

SPM CLI build (no Xcode)

swift build -c release

Note: the Metal shader (default.metallib) is compiled by Xcode's pre-build script. Running via SPM CLI requires manually compiling and placing default.metallib in the same directory as the binary, or disabling Metal GPU offload.

Permissions

On first launch, grant these when prompted:

Permission	Why
Microphone	Audio capture
Input Monitoring	Global hotkey via `CGEventTap`
Accessibility	AX-based text injection into other apps

To reset permissions:

bash Scripts/reset-permissions.sh

Usage

Launch WhisKey — appears as a menu bar icon.
Push-to-Talk: Hold Right Option >300ms and speak — transcribed text types into the focused window.
Hands-Free Toggle: Double-tap Right Option within 300ms to toggle hands-free transcription mode (if enabled in Settings → Hotkey).
Accidental taps <80ms are silently discarded.

Hotkey Interaction

The Right Option key disambiguates hold duration:

Hold >300ms — Push-to-Talk transcription
Double-tap within 300ms — Hands-Free toggle (when enabled)
Disambiguation window — Configurable 250–500ms (default 300ms) in Settings → Hotkey

Settings (menu bar → Settings)

Setting	Tab	Options
Whisper model	Transcription	tiny, base, small, medium, large
Language hint	Transcription	BCP-47 code, or blank for auto-detect
LLM cleanup	Transcription	On / Off
LLM model	Transcription	phi-3.5-mini, llama-3.2-1b
Tone style	Transcription	casual, formal, literal
Remove fillers	Transcription	On / Off
Add punctuation	Transcription	On / Off
Raw mode	Transcription	Bypass LLM regardless of other settings
Output mode	Transcription	Active window / Clipboard / Both
Notifications	Transcription	On / Off
Primary hotkey	Hotkeys	Key capture with conflict detection
Hands-Free toggle	Hotkeys	On / Off — enables Right Option double-tap
Disambiguation window	Hotkeys	250–500ms slider (default 300ms)
Voice snippets	Snippets	Trigger phrase → expansion text mapping
Egress audit	Privacy	Live log of all outbound network calls

Transcription History

Every transcription is stored in SQLite (~/Library/Application Support/WhisKey/history.db) with the raw transcript, cleaned text, source app bundle ID, and timestamp. Viewable from the menu bar.

Running Tests

swift test
# or in Xcode: Cmd-U

Tests target WhisKeyCoreTests and run against the host app binary (WhisKey.app).

Name		Name	Last commit message	Last commit date
Latest commit History 93 Commits
.github/workflows		.github/workflows
CGGML		CGGML
CLlama		CLlama
CWhisper		CWhisper
SQLCipherLib		SQLCipherLib
Scripts		Scripts
Sources		Sources
Tests/WhisKeyCoreTests		Tests/WhisKeyCoreTests
Vendor		Vendor
WhisKey.xcodeproj		WhisKey.xcodeproj
WhisKey		WhisKey
clients/self/projects/whiskey		clients/self/projects/whiskey
.gitignore		.gitignore
.gitmodules		.gitmodules
.swiftlint.yml		.swiftlint.yml
CHANGELOG.md		CHANGELOG.md
Package.swift		Package.swift
README.md		README.md
THIRD_PARTY_NOTICES.md		THIRD_PARTY_NOTICES.md
project.yml		project.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

WhisKey

Features

Core

Sprint 3

How It Works

Pipeline

Module Layout

Key Components

Output Modes

LLM Cleanup

Building

1. Install tools

2. Clone and generate the Xcode project

3. Download models

4. Build and run

SPM CLI build (no Xcode)

Permissions

Usage

Hotkey Interaction

Settings (menu bar → Settings)

Transcription History

Running Tests

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

WhisKey

Features

Core

Sprint 3

How It Works

Pipeline

Module Layout

Key Components

Output Modes

LLM Cleanup

Building

1. Install tools

2. Clone and generate the Xcode project

3. Download models

4. Build and run

SPM CLI build (no Xcode)

Permissions

Usage

Hotkey Interaction

Settings (menu bar → Settings)

Transcription History

Running Tests

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages