GitHub - 0xbrando/dictate: Push-to-talk voice dictation for macOS. 100% local, free, open source. Apple Silicon MLX. No cloud, no subscription.

Push-to-talk voice dictation that runs entirely on your Mac.
No cloud. No API keys. No subscriptions.

Hold a key → Speak → Release → Clean text appears wherever your cursor is.

Why Dictate?

65ms voice-to-text on Apple's Neural Engine — faster than a keystroke
Zero GPU RAM for STT — the Neural Engine has its own dedicated memory
100% local — audio and text never leave your Mac
Free and open source — no subscriptions, no API keys, no accounts
LLM text cleanup — local model fixes grammar and punctuation automatically
52+ languages — real-time translation between any supported pair

Your M-series Mac has a 16-core Neural Engine doing nothing. Dictate puts it to work.

Install

pip install dictate-mlx
dictate

That's it. Dictate launches in the background and appears in your menu bar. Close the terminal — it keeps running.

For Qwen3-ASR support (52-language STT engine):

pip install dictate-mlx[qwen3-asr]

This is still local-only. No API key is required; the extra installs the MLX Qwen3-ASR runtime.

Homebrew source install is available for users who prefer Brew. It builds the Swift ANE helper and installs the Python app into a Homebrew-managed virtualenv:

brew tap 0xbrando/dictate
brew install dictate

The cask/DMG path is still planned; pip install dictate-mlx remains the simplest install path.

macOS will prompt for Accessibility and Microphone permissions on first run. Dictate downloads only the selected default models, then caches them in ~/.cache/huggingface/. Other cleanup models are one-click downloads from the Quality menu.

Install from source

git clone https://github.com/0xbrando/dictate.git
cd dictate
python3 -m venv .venv
source .venv/bin/activate
pip install -e .
dictate

Requirements

macOS with Apple Silicon (any M-series chip)
Python 3.11+
~3GB RAM with ANE (STT runs on Neural Engine, only LLM needs GPU memory)

Features

Push-to-Talk

Hold a key, speak, release. Text appears wherever your cursor is.

Action	Key
Record	Hold Left Control
Lock recording (hands-free)	Press Space while holding PTT
Stop locked recording	Press PTT again

The PTT key is configurable: Left Control, Right Control, Right Command, or either Option key.

LLM Text Cleanup

The thing that sets Dictate apart. Most dictation tools give you raw transcription. Dictate pipes through a local LLM that fixes grammar, adds punctuation, and formats properly.

Short phrases (≤15 words) skip cleanup for instant speed. Longer dictation gets the full treatment.

Local STT Engine Stack

Dictate is designed around local speech recognition. Switch anytime from the menu bar.

Engine	Speed	Languages	Notes
ANE / FluidAudio	~65ms	25	Default — Parakeet TDT v3 through Core ML on Apple Neural Engine
Qwen3-ASR 0.6B	~50ms	52	Best broad multilingual local path — includes CJK, Arabic, Hindi
Parakeet TDT v3 0.6B	~50ms	25	Fast European-language GPU/MLX fallback
Whisper Large V3 Turbo	~300ms	99+	Compatibility fallback for maximum language coverage

ANE is the default. It runs speech recognition through FluidAudio on Apple's Neural Engine — a dedicated chip that sits idle during most tasks. This frees the GPU entirely for LLM text cleanup, so STT and LLM run concurrently with zero contention. The result: 65-106ms transcription on real speech.

Qwen3-ASR is the recommended local multilingual engine — 52 languages including Japanese, Chinese, and Korean at Parakeet-level speed. Requires pip install dictate-mlx[qwen3-asr].

Dictate auto-switches engines based on language: ANE/Parakeet for European languages, Qwen3-ASR for CJK and others, Whisper as the universal fallback.

Writing Styles

Style	What it does
Clean Up	Fixes punctuation and capitalization — keeps your words
Professional	Polished tone and grammar
Bullet Points	Rewrites as concise bullet points

Toggle LLM cleanup off from the menu bar for raw transcription output.

Real-Time Translation

Speak in one language, get output in another. 12 languages supported: English, Spanish, French, German, Italian, Portuguese, Japanese, Korean, Chinese, Russian, Arabic, Hindi.

Quality Presets

Dictate does not install every LLM up front. First run downloads the recommended default for your Mac; selecting another local Quality preset downloads that model once and reuses it forever. If you already run Ollama, LM Studio, vLLM, or another OpenAI-compatible localhost server, choose Local API Server to avoid loading a Dictate-managed cleanup model.

Preset	Speed	Size	Best for
Fast — Qwen2.5 1.5B	~250ms	950MB	Lowest RAM, quick cleanup
Balanced — Qwen3.5 2B	~280ms	1.3GB	Default for most Macs; best speed/quality tradeoff
Quality — Qwen2.5 3B	~400ms	1.8GB	Slower, best polish
Local API Server	varies	0	Use your own localhost LLM server (LM Studio, Ollama, etc.)

Short phrases (15 words or less) skip LLM cleanup entirely for instant output. The app picks the best default model for your chip.

Recommended defaults:

If you want...	Use
Smallest install and lowest memory	Fast
Best default experience	Balanced
Maximum cleanup quality	Quality
No bundled LLM download	Local API Server

End-to-End Pipeline

Full latency from voice → text on screen:

Mode	GPU RAM	Latency
LLM off (raw transcription)	0	~65ms
LLM on (Qwen3.5 2B)	~1.3GB	~345ms
LLM on (Qwen2.5 3B)	~1.8GB	~465ms

With ANE, speech recognition runs on a dedicated chip with its own memory — zero GPU usage. Turn off LLM cleanup and the entire app uses no GPU RAM at all.

Menu Bar

Everything accessible from the waveform icon:

Writing Style — Clean Up, Professional, Bullet Points
Quality — Fast, Balanced, Quality, or localhost API server; missing models download when selected
Input Device — select microphone
Recent — last 10 transcriptions, click to re-paste
STT Engine — ANE (default), Qwen3-ASR, Parakeet, or Whisper
PTT Key — choose your push-to-talk modifier
Languages — input and output language
Sounds — 6 notification tones or silent
Personal Dictionary — names, brands, technical terms always spelled correctly
Launch at Login — auto-start on boot

ANE Engine Setup

The ANE (Apple Neural Engine) engine is the default and recommended STT engine. It requires a small Swift binary that Dictate calls behind the scenes. If the binary isn't installed, Dictate falls back to Parakeet (GPU-based STT).

# Build from source (requires Xcode command line tools)
cd swift-stt
swift build -c release

# The binary lands at swift-stt/.build/release/dictate-stt
# Either add it to your PATH or leave it — Dictate finds it automatically

First run: CoreML models download automatically (~2.7GB) and compile for your chip. This takes 1-2 minutes the first time. After that, models are cached and transcription starts instantly.

Requirements: macOS 14+ (Sonoma or later), Apple Silicon.

What it does: The dictate-stt binary uses FluidAudio to run Parakeet speech recognition on the Neural Engine via CoreML. All processing is local — no network calls after the initial model download.

How it works

When you select ANE in the menu bar, Dictate starts the dictate-stt helper once and keeps it warm:

Dictate records audio and saves it as a temporary WAV file
Starts dictate-stt serve and loads FluidAudio/CoreML models once
Sends each WAV path to the helper over JSON lines
The Swift binary runs the audio through CoreML on the Neural Engine
Returns JSON to stdout: {"text": "Hello world", "duration_ms": 68}
Dictate parses the result and pipes it through LLM cleanup as usual

The binary is a standalone executable with no Python dependency. You can also use it directly:

dictate-stt check                    # Verify ANE is available
dictate-stt transcribe recording.wav # Transcribe a WAV file
dictate-stt serve                    # Keep models warm for repeated requests

Local API Server

If you run a local LLM server, Dictate can use it instead of loading its own model — zero additional RAM:

DICTATE_LLM_BACKEND=api DICTATE_LLM_API_URL=http://localhost:8005/v1/chat/completions dictate

Works with any OpenAI-compatible server on your Mac: vllm-mlx, LM Studio, Ollama.

The API Server preset is still local-first. Remote URLs are blocked unless you explicitly set DICTATE_ALLOW_REMOTE_API=1.

Cloud Policy

Dictate does not need cloud services. Audio and text stay on your Mac by default.

Cloud endpoints are intentionally opt-in only:

DICTATE_ALLOW_REMOTE_API=1 DICTATE_LLM_API_URL=https://example.com/v1/chat/completions dictate

Do not enable this unless you understand that cleaned-up text may leave your machine. Speech recognition remains local unless you replace Dictate's STT pipeline yourself.

Environment Variables

All environment variables

Variable	Description	Default
`DICTATE_AUDIO_DEVICE`	Microphone device index	System default
`DICTATE_OUTPUT_MODE`	`type` or `clipboard`	`type`
`DICTATE_STT_ENGINE`	`ane`, `qwen3-asr`, `parakeet`, or `whisper`	`ane`
`DICTATE_INPUT_LANGUAGE`	`auto`, `en`, `ja`, `ko`, etc.	`auto`
`DICTATE_OUTPUT_LANGUAGE`	Translation target (`auto` = same)	`auto`
`DICTATE_LLM_CLEANUP`	Enable LLM text cleanup	`true`
`DICTATE_LLM_MODEL`	`qwen2.5-1.5b`, `qwen3.5-2b`, `qwen-3b`	`qwen3.5-2b`
`DICTATE_LLM_BACKEND`	`local` or `api`	`local`
`DICTATE_LLM_API_URL`	OpenAI-compatible endpoint	`http://localhost:8005/v1/chat/completions`
`DICTATE_ALLOW_REMOTE_API`	Allow non-localhost API URLs	unset

Agent Integration

Dictate works well as a voice input layer for AI assistants and agent frameworks. If you're building with tools like Claude Code, OpenClaw, or similar — Dictate gives your setup a local, private voice interface with zero cloud dependency.

CLI Commands

dictate              # Launch in menu bar (backgrounds automatically)
dictate config       # View all preferences
dictate config set writing_style professional
dictate config set quality fast
dictate config set ptt_key cmd_r
dictate config set stt whisper
dictate config reset # Reset to defaults
dictate stats        # Show usage statistics
dictate status       # System info and model status
dictate doctor       # Run diagnostic checks (troubleshooting)
dictate devices      # List audio input devices
dictate update       # Update to latest version
dictate -f           # Run in foreground (debug)
dictate -V           # Show version

Config Keys

Key	Values
`writing_style`	clean, professional, bullets
`quality`	api, fast, balanced, quality
`stt`	ane, qwen3-asr, parakeet, whisper
`input_language`	auto, en, ja, de, fr, es, ...
`output_language`	auto, en, ja, de, fr, es, ...
`ptt_key`	ctrl_l, ctrl_r, cmd_r, alt_l, alt_r
`llm_cleanup`	on, off
`sound`	soft_pop, chime, warm, click, marimba, simple
`llm_endpoint`	host:port (for API backend)

Shell Completions

Tab completions for bash and zsh:

# Bash — add to ~/.bashrc
source /path/to/dictate/completions/dictate.bash

# Zsh — copy to fpath dir, then reload
cp completions/dictate.zsh ~/.zsh/completions/_dictate
autoload -Uz compinit && compinit

Completes commands, config keys, and all valid values.

Debugging

# Run in foreground with logs
dictate --foreground

# Check background logs
tail -f ~/Library/Logs/Dictate/dictate.log

Security

All processing is local. Audio and text never leave your machine.
Temporary audio files stored in a private directory with owner-only permissions — not world-readable /tmp.
The ANE engine's dictate-stt binary is open source Swift code you build yourself from swift-stt/. CoreML models download from Hugging Face on first run, then everything is cached locally.
Models restricted to the mlx-community/ HuggingFace namespace only.
LLM endpoints restricted to localhost by default (DICTATE_ALLOW_REMOTE_API=1 to override).
Preferences and stats stored with 0o600 permissions (owner-only read/write).
Log rotation (5MB, 3 backups) prevents disk exhaustion.
HuggingFace telemetry disabled at startup (DO_NOT_TRACK=1).
No API keys, tokens, or accounts required. No unsafe code patterns.

Contributing

Issues and PRs welcome. Run the test suite before submitting:

python -m pytest tests/ -q

See CONTRIBUTING.md for guidelines.

License

MIT — See LICENSES.md for dependency licenses.

Name		Name	Last commit message	Last commit date
Latest commit History 113 Commits
.github/ISSUE_TEMPLATE		.github/ISSUE_TEMPLATE
Formula		Formula
assets		assets
completions		completions
demo		demo
dictate		dictate
docs		docs
scripts		scripts
swift-stt		swift-stt
tests		tests
.env.example		.env.example
.gitignore		.gitignore
.mailmap		.mailmap
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
LICENSES.md		LICENSES.md
README.md		README.md
SECURITY.md		SECURITY.md
benchmark.py		benchmark.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup_app.py		setup_app.py
start-menubar.sh		start-menubar.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Push-to-talk voice dictation that runs entirely on your Mac.
No cloud. No API keys. No subscriptions.

Why Dictate?

Install

Requirements

Features

Push-to-Talk

LLM Text Cleanup

Local STT Engine Stack

Writing Styles

Real-Time Translation

Quality Presets

End-to-End Pipeline

Menu Bar

ANE Engine Setup

Local API Server

Cloud Policy

Environment Variables

Agent Integration

CLI Commands

Config Keys

Shell Completions

Debugging

Security

Contributing

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Push-to-talk voice dictation that runs entirely on your Mac.No cloud. No API keys. No subscriptions.

Why Dictate?

Install

Requirements

Features

Push-to-Talk

LLM Text Cleanup

Local STT Engine Stack

Writing Styles

Real-Time Translation

Quality Presets

End-to-End Pipeline

Menu Bar

ANE Engine Setup

Local API Server

Cloud Policy

Environment Variables

Agent Integration

CLI Commands

Config Keys

Shell Completions

Debugging

Security

Contributing

License

About

Topics

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Push-to-talk voice dictation that runs entirely on your Mac.
No cloud. No API keys. No subscriptions.

Packages