Get on Mac app store (safe: completely sandboxed, no internet access, these claims are vetted by Apple App Store reviewers). Contributor warning: 90% vibecoded.
A lightweight, privacy-focused voice-to-text app for macOS. Built with Swift and powered by the Parakeet engine for fast, local speech recognition.
- 98% vibecoded, inspired by SuperWhisper.
- 0 API calls, all local
This is what it looks like when you're speaking (but smaller and translucent):
And configuring:
- 🎤 Global Hotkey Recording - Press
⌘⇧Spaceto start/stop recording from anywhere - 🤖 Local AI Processing - Uses Parakeet engine running entirely on your Mac
- 🔒 Privacy First - No data leaves your device, all processing is local
- ⚡ Fast & Lightweight - Optimized for Apple Silicon Macs
- 📋 Smart Text Insertion - Automatically inserts transcribed text where your cursor is
- 🎯 Simple & Elegant - Clean SwiftUI interface, minimal configuration needed
- macOS 14.0 (Sonoma) or later
- Apple Silicon Mac (M1/M2/M3) recommended for best performance
- Microphone access permission
Easy: Get on Mac app store (safe: completely sandboxed, no internet access, these claims are vetted by Apple App Store reviewers). Contributor warning: 90% vibecoded.
# Install from the local formula (after cloning repo)
brew install --formula Formula/superhoarse.rb
# Or create a tap for easier installation:
# brew tap mheiber/superhoarse https://github.com/mheiber/superhoarse
# brew install superhoarseNote: The Homebrew build downloads ~607MB of speech recognition models from HuggingFace during installation.
See "Contributing" below.
- Launch the app - It will appear in your menu bar
- Grant permissions when prompted:
- Microphone access for recording
- Accessibility access for text insertion
- Start recording - Press
⌘⇧Spaceanywhere on your Mac - Speak clearly - The recording indicator will show in the app
- Stop recording - Press
⌘⇧Spaceagain - Get results - Transcribed text appears where your cursor was
see ./user_flows.md for development-focused details about user interactions.
Superhoarse is built with simplicity and performance in mind:
┌─────────────────┐ ┌──────────────────┐ ┌─────────────────┐
│ SwiftUI App │────│ App State │────│ HotKey Manager │
│ │ │ │ │ │
│ - Recording UI │ │ - Coordinates │ │ - Global ⌘⇧⎵ │
│ - Status Window │ │ all components │ │ - Carbon Events │
│ - Settings │ │ - State Updates │ │ │
└─────────────────┘ └──────────────────┘ └─────────────────┘
│
┌──────────────┼──────────────┐
│ │ │
┌────────▼────────┐ ┌▼──────────────▼┐
│ Audio Recorder │ │ Speech │
│ │ │ Recognizer │
│ - AVFoundation │ │ │
│ - 16kHz PCM │ │ - Parakeet │
│ - Temp files │ │ - Local models │
└─────────────────┘ └────────────────┘
- FluidAudio (
~0.12.1) - High-performance audio processing for speech recognition- Why chosen: Optimized Swift implementation for real-time audio processing
- Alternatives considered: Built-in SpeechRecognizer (cloud-based), Core ML models (larger)
- Benefits: Runs entirely offline, optimized for Apple Silicon, minimal latency
-
SwiftUI - Modern declarative UI framework
- Why chosen: Native performance, minimal code, automatic dark mode support
- Alternative: AppKit (more verbose, procedural)
-
AVFoundation - Audio recording and processing
- Why chosen: Native macOS audio framework with hardware optimization
- Alternative: Third-party audio libraries (unnecessary complexity)
-
Carbon - Low-level system access for global hotkeys
- Why chosen: Only way to register system-wide keyboard shortcuts on macOS
- Alternative: None for global hotkeys
- Parakeet Engine - Lightweight, fast speech recognition
- Why chosen: Optimized for real-time transcription with low latency
- Alternatives: Cloud-based services (privacy concerns), larger models (slower)
- Trade-offs: Parakeet provides excellent speed while maintaining good accuracy
# Run in development mode
make run
# Clean build artifacts
make clean
# Set up development environment
make setup- ✅ No telemetry or analytics
- ✅ No network requests - completely offline
- ✅ All processing happens locally on your Mac
- ✅ Audio recordings are temporary and deleted immediately
- ✅ No user data is stored or transmitted
- macOS only - Built specifically for Apple's ecosystem
- Multilingual - Supports 25 European languages (English, French, German, Spanish, etc.). Chinese/Japanese/Korean not yet supported (would require Qwen3-ASR model).
All written by Claude. Hey Claude:
- Keep it simple - Favor readable, succinct code over clever optimizations
- Minimal dependencies - Only add dependencies that provide substantial value
- Privacy first - No features that compromise local-only processing
- Performance matters - Optimize for Apple Silicon architecture
- Test both manually and with automated tests.
swift test. Try to keep tests at a high level and from the user's point of view, avoiding testing implementation details.- Manual(ish) testing. User flows are in ./user_flows.md
We have both a Swift build (fast, not as realistic for app store) and an Xcode build (slower, closer to what we distribute).
make build- Models (~607MB) download automatically on first build from HuggingFaceswift testto test. USE THIS FREQUENTLY DURING DEVELOPMENT./test_e2e.shfor an end-to-end test that actually turns on the speakers and listens./test_e2e_xcode.she2e test for the xcode build
Automatic download: Models download automatically during build if missing or corrupt. First build will download ~607MB from HuggingFace.
Update to latest models:
make update-modelsModel verification:
# Quick check (uses cached marker, ~0.01s)
make check-models
# Full validation (re-hashes all files, ~2-3s)
make modelsTroubleshooting:
- Models won't download: Check internet connection and HuggingFace status
- Checksum mismatch: Run
make update-modelsto force re-download - Disk space: Models require ~607MB in Sources/Resources/
make installto build, copy to Applications folder, and start the app. Requiressudo. Human-only.
Closed-source Software we build for money
- Inspired by SuperWhisper by Neil Chudleigh
- Built with FluidAudio for high-performance audio processing
- Uses Parakeet engine for fast, local speech recognition

