Windows Only - This tool uses Windows-specific APIs for global hotkeys and system tray integration. It will not run on macOS or Linux.
Hold a hotkey to record your voice, release to transcribe and type the text into any application. Or enable Open Mic Mode for hands-free dictation with wake word activation.
Uses OpenAI Whisper (via faster-whisper) with GPU acceleration for fast, accurate transcription. Runs quietly in the system tray.
- Push-to-talk - Hold hotkey to record, release to transcribe
- Open Mic Mode - Say a wake word to start recording; silence ends the segment automatically
- System tray icon - Green (ready), blue (open mic listening), red (recording), yellow (processing)
- Audio feedback - Ascending/descending tones confirm recording start/stop in open mic mode
- Configurable hotkey - Default Alt+F, fully customizable
- Multiple languages - English, auto-detect, or 50+ language codes
- Model selection - Trade speed vs accuracy (tiny → large)
- GPU acceleration - CUDA support for fast transcription
- Offline capable - After initial model download
The first time you run the tool, it downloads the Whisper speech model. Model sizes: tiny ~75MB, base ~150MB, small ~500MB, medium ~1.5GB, large ~3GB. After download, the model is cached and works offline.
Option A - Microsoft Store (easiest):
- Open Microsoft Store
- Search "Python 3.13"
- Click Install
Option B - python.org:
- Go to https://www.python.org/downloads/
- Download Python 3.13+
- Run installer
- IMPORTANT: Check "Add Python to PATH" during installation
Option C - winget:
winget install Python.Python.3.13For fast transcription, you need an NVIDIA GPU with CUDA support. Typical transcription times vary by audio length and hardware (GPU: ~1-3 seconds for short phrases, CPU: ~5-15 seconds).
Check if you have an NVIDIA GPU:
- Press
Win+X→ Device Manager - Expand "Display adapters"
- Look for "NVIDIA GeForce..." or "NVIDIA RTX..."
If you have NVIDIA GPU, install CUDA Toolkit:
- Go to https://developer.nvidia.com/cuda-downloads
- Select Windows → x86_64 → 11 → exe (local)
- Download and install (use Express installation)
- Restart your computer
No NVIDIA GPU? The tool will automatically use CPU mode. It's slower but works fine.
-
Run the installer:
Double-click: install.bat -
Follow the prompts:
- Choose your hotkey (default: Alt+F)
- Select model size (tiny/base/small/medium/large)
- Select language (English/auto-detect/other)
-
Verify installation:
Double-click: test-install.bat
The installer is idempotent - safe to run multiple times to reconfigure.
-
Start the tool:
Double-click: start-dictation.batA startup healthcheck opens in the command window:
- Validates microphone stream health
- Prompts you to say "check 1 2 3"
- Verifies transcription before background launch (up to 3 attempts)
- Displays previous runtime state from
%LOCALAPPDATA%\VoiceDictation\state.jsonA colored circle then appears in your system tray.
Optional launch modes:
start-dictation.bat --healthcheck-only(run checks and exit)start-dictation.bat --skip-healthcheck(launch immediately)
-
Dictate:
- Hold your hotkey (default: Alt+F)
- Icon turns red - speak clearly
- Release the hotkey
- Icon turns yellow while processing
- Text appears in your active window
- Icon returns to green
Note: Clipboard backup is disabled by default. Enable
USE_CLIPBOARD = Trueinsrc/config.pyif you want every transcript copied as backup. -
Open Mic Mode (hands-free):
- Right-click tray icon → Enable Open Mic Mode
- Icon turns blue — listening for wake word
- Say "hey Jarvis" (or your configured wake word)
- Ascending tone plays — recording started
- Speak naturally — the system detects when you stop talking
- Descending tone plays — recording ended, transcribing
- Text appears in your active window
- Icon returns to blue, ready for the next wake word
Both modes work simultaneously — you can use the hotkey anytime even with open mic enabled. Open mic mode uses OpenWakeWord for lightweight, local wake word detection on CPU.
First-time setup: Install the dependency with
pip install openwakeword(or re-runinstall.bat). Pre-trained wake words include "hey Jarvis", "alexa", "hey Mycroft", and others. -
Check status: Hover over tray icon for current settings
-
Run healthcheck while app is running: Right-click tray icon →
Run Startup Healthcheck... -
Exit: Right-click tray icon → Exit
This tool is optimized for single-speaker dictation in reasonably quiet environments. Transcription accuracy may degrade in the following scenarios:
- Background conversations - Multiple voices speaking simultaneously
- Noisy environments - Loud ambient noise, machinery, or music
- Distant microphone placement - Speaking far from the microphone
For best results, use a close-range microphone and minimize background noise. If you experience issues with ambient noise, enable noise reduction in src/config.py:
NOISE_REDUCTION = TrueThis applies audio filtering before transcription, which can help with stationary noise (fans, AC, traffic hum) but may not fully isolate your voice from other speakers.
Transcribed text is injected into the active window using simulated keystrokes with a 10ms delay between characters. This deliberate throttling prevents crashes in certain terminal applications (notably Claude Code's TUI).
If you want clipboard backup, enable it in src/config.py:
USE_CLIPBOARD = TrueRun uninstall.bat to remove:
- Virtual environment
- Configuration
- Log files
- Optionally: downloaded model cache (~500MB-3GB)
The keyboard library used for hotkey detection installs a global keyboard hook via the Windows API. This hook receives all keystrokes system-wide, not just the configured hotkey. The application only processes press/release events for the configured hotkey and discards all other key events immediately. No keystrokes are logged, stored, or transmitted. Users should ensure the application is installed from a trusted source and review the src/dictate.py hotkey handler if concerned.
- Install Python (see Prerequisites above)
- Make sure "Add to PATH" was checked during installation
- Try restarting your computer
- Right-click
start-dictation.bat→ "Run as administrator"
- You're probably in CPU mode
- Install NVIDIA drivers and CUDA toolkit (see Prerequisites)
- Re-run
install.batto reconfigure
- Check your microphone is connected
- Windows Settings → Sound → Input → Make sure correct mic is selected
- Windows Settings → Privacy → Microphone → Allow apps to access microphone
- Try unplugging and replugging USB microphones
- Check your internet connection
- If behind a corporate proxy, you may need to configure proxy settings
- The model downloads to
%USERPROFILE%\.cache\huggingface- ensure you have enough free space - Try again later if HuggingFace servers are slow
- Make sure CUDA Toolkit is installed (not just NVIDIA drivers)
- Restart your computer after installing CUDA
- Re-run
install.batto reconfigure
- Check the system tray overflow (^ arrow near clock)
- Some systems hide new tray icons by default
After installation, edit src/config.py or re-run install.bat.
install.bat writes the core keys; dictate.py also supports additional optional keys below.
If an optional key is missing, runtime defaults are applied automatically.
HOTKEY = 'alt+f' # Your recording hotkey
MODEL_SIZE = 'small' # tiny, base, small, medium, large
LANGUAGE = 'en' # 'en', 'auto', 'es', 'fr', 'de', 'ja', etc.
DEVICE = 'cuda' # 'cuda' or 'cpu'
COMPUTE_TYPE = 'float16' # 'float16' for GPU, 'int8' for CPU
AUDIO_DEVICE = None # Saved device name (auto-managed)
AUDIO_DEVICE_HOSTAPI = None # Saved host API (auto-managed)
AUDIO_DEVICE_INDEX = None # Saved preferred index (auto-managed)
AUDIO_DEVICE_UID = None # Saved stable device fingerprint (auto-managed)
NOISE_REDUCTION = False # True to filter background noise
NOISE_GATE_THRESHOLD = 0.01 # Base RMS gate threshold
NOISE_GATE_PEAK_MULTIPLIER = 3.0 # Allow low-RMS clips if peaks indicate speech
USE_CLIPBOARD = False # Copy text to clipboard as backup (opt-in)
LOG_TRANSCRIPT_TEXT = False # Log transcript snippets (debug only)
MAX_TYPED_CHARS = 1000 # Maximum characters typed per utterance
LOG_LEVEL = 'INFO' # Runtime log verbosity (DEBUG/INFO/WARNING/ERROR)
VOCABULARY = '' # Custom words: 'Claude, Anthropic, TypeScript'
# Open Mic / Wake Word Mode
WAKE_WORD_ENABLED = False # Start with open mic on at launch
WAKE_WORD_MODEL = 'hey_jarvis_v0.1' # Pre-trained model or path to custom .onnx
WAKE_WORD_THRESHOLD = 0.5 # Detection confidence (0.0-1.0)
WAKE_WORD_SILENCE_TIMEOUT_S = 2.0 # Seconds of silence to end a segment
WAKE_WORD_OUTPUT_FILE = None # Path to transcription log file (optional)Whisper sometimes misinterprets names and technical terms. Add them to VOCABULARY in config.py:
VOCABULARY = 'Claude, Anthropic, TypeScript, GitHub, JIRA'This primes the model to recognize these spellings correctly. Just list the words separated by commas - the model learns the correct spelling from context. Restart the app after changing.
Detailed runtime and dependency architecture diagrams are documented in ARCHITECTURE.md.
voice-dictation/
|-- src/ # application modules
| |-- voice_dictation/ # internal split modules (recording/watchdogs)
|-- tests/ # pytest suites
|-- install.bat
|-- start-dictation.bat
|-- test-install.bat
|-- launch.cmd
|-- uninstall.bat
|-- README.md
|-- ARCHITECTURE.md
|-- TESTING-PLAN.md
| File | Purpose |
|---|---|
src/dictate.py |
Main orchestration + compatibility facade (tray/hotkey/runtime wiring) |
src/voice_dictation/recording_pipeline.py |
Extracted recording/transcription preparation pipeline helpers |
src/voice_dictation/watchdog_loops.py |
Extracted recording and stream watchdog/recovery loops |
src/voice_dictation/wake_word_listener.py |
Wake word detection loop with energy-based silence timeout |
src/voice_dictation/shared_audio_buffer.py |
Thread-safe FIFO for audio frames between producer/consumer |
src/voice_dictation/wake_word_mode.py |
Wake word mode state management (enable/disable/toggle) |
src/voice_dictation/transcription_file_writer.py |
Plain text transcription log file writer |
src/diagnostics.py |
Diagnostic log and runtime-state analyzer |
src/startup_healthcheck.py |
Operational preflight + spoken phrase verification |
src/calibrate.py |
Noise gate calibration workflow |
src/audio_device_identity.py |
Shared microphone identity, UID, and fallback resolution helpers |
src/runtime_state.py |
Shared runtime state read/write helpers (%LOCALAPPDATA%\VoiceDictation\state.json) |
src/app_state.py |
Dataclass-backed runtime state container for dictation lifecycle |
src/audio_stream_manager.py |
Centralized stream open/close/switch/reopen behavior |
src/audio_capture.py |
Shared audio probe and fixed-duration capture helpers |
src/transcription_io.py |
Shared temporary WAV transcription pipeline for Whisper |
src/config_store.py |
Structured config.py read/update helpers with atomic writes |
src/speak.py |
Text-to-speech utility (see below) |
src/claude_status_tts.py |
Claude statusline helper (not part of dictation runtime) |
src/config.py |
Your settings (generated) |
src/config.example.py |
Configuration template |
install.bat |
Setup wizard (safe to re-run) |
uninstall.bat |
Remove installation |
start-dictation.bat |
Launch the tool |
launch.cmd |
Minimal headless launcher (starts pythonw directly, no startup healthcheck prompt) |
test-install.bat |
Verify installation |
tests/test_dictate.py- core tray/hotkey/transcription behaviortests/test_dictate_runtime_guards.py- runtime race/restart/guard regression coveragetests/test_startup_healthcheck.py- startup healthcheck behavioral flowtests/test_diagnostics.py- diagnostics parsing/aggregation/report outputtests/test_calibrate.py- calibration and fallback behaviortests/test_audio_device_identity.py- shared device identity and resolution logictests/test_runtime_state.py- shared runtime state persistence helperstests/test_config_store.py- config assignment upsert and literal formattingtests/test_audio_stream_manager.py- stream lifecycle abstraction behaviortests/test_audio_capture.py- shared capture/probe helperstests/test_transcription_io.py- temporary WAV transcription helper behaviortests/test_app_state.py- runtime state container defaultstests/test_claude_status_tts.py- command hardening for Claude statusline helpertests/test_wake_word_listener.py- wake word detection, silence timeout, frame exclusiontests/test_wake_word_components.py- file writer, shared buffer, mode toggletests/test_watchdog_recovery.py- device re-resolve after persistent recovery failures
A standalone utility for text-to-speech using Microsoft Edge's neural voices:
.venv\Scripts\python src\speak.py "Hello, this is a test."
This is a separate tool from the main dictation functionality and is included for convenience.
MIT License - see LICENSE for details.
This project is built on excellent open source software:
- OpenAI Whisper - Speech recognition model (MIT)
- faster-whisper - Optimized Whisper implementation (MIT)
- pystray - System tray integration (LGPLv3)
- Pillow - Icon generation (HPND)
- keyboard - Global hotkeys (MIT)
- sounddevice - Audio capture (MIT)
- noisereduce - Audio noise reduction (MIT)
- openWakeWord - Wake word detection (Apache 2.0)
- edge-tts - Text-to-speech (MIT)