Vox - Voice-to-Text with One Keystroke

A fast, lightweight voice transcription tool supporting multiple state-of-the-art ASR models. Press a hotkey to record, and your speech is instantly transcribed and pasted where you need it.

Features

🎙️ One-key recording: Hold Ctrl+Space to record, release to transcribe (push-to-talk)
🤖 Multiple ASR backends: Parakeet, Moonshine, SenseVoice support
🌍 Multilingual: Support for multiple languages depending on model choice
📋 Smart auto-paste: Automatically detects terminals for correct paste method (Ctrl+Shift+V vs Ctrl+V)
🧠 Optional LLM post-processing: Clean up transcriptions with a local ONNX model or cloud API (Anthropic, OpenAI, Gemini)
📜 History tracking: Last 100 transcripts saved to history.json (no audio files)
⚡ Fast daemon mode: Pre-loaded model for instant transcription
⌨️ Near-zero paste latency: Persistent uinput device eliminates per-paste Wayland registration delay
🔒 Privacy first: Runs 100% locally by default; cloud LLM is opt-in
🎯 Wayland native: Tested on Hyprland; works on Sway and other compositors
📦 On-demand model downloads: Models downloaded from HuggingFace on first use

Requirements

Rust 1.70 or later
Microphone access

Linux System Dependencies

Fedora / RHEL / CentOS:

sudo dnf install -y \
  gtk3-devel \
  gdk-pixbuf2-devel \
  cairo-gobject-devel \
  glib2-devel \
  pkgconf-pkg-config

Ubuntu / Debian:

sudo apt install -y \
  libgtk-3-dev \
  libgdk-pixbuf-2.0-dev \
  libcairo-gobject2 \
  libglib2.0-dev \
  pkg-config

Installation

Linux (Quick Install)

git clone https://github.com/yourusername/vox
cd vox
chmod +x install_linux.sh
./install_linux.sh

The installer will:

Build the daemon and client binaries
Install to ~/.local/bin/
Optionally set up systemd service
Configure compositor-specific hotkeys (Hyprland/Sway)

Enable auto-paste (required for Ctrl+V injection):

sudo setcap "cap_dac_override+p" $(which vox-daemon)

Manual Installation

cargo build --release --bin vox-daemon --bin vox
cp target/release/vox-daemon ~/.local/bin/
cp target/release/vox ~/.local/bin/

Usage

Push-to-talk (recommended)

Bind two keys in your compositor — one to start, one to stop:

Hyprland (~/.config/hypr/hyprland.conf):

# Basic: transcribe only
bind = CTRL, SPACE, exec, vox start
bindr = CTRL, SPACE, exec, vox stop

# With LLM cleanup (uses configured LLM)
bind = CTRL SHIFT, SPACE, exec, vox start --llm
bindr = CTRL SHIFT, SPACE, exec, vox stop

# With a custom one-off prompt
bind = SUPER, SPACE, exec, vox start --prompt "Format as bullet points: {text}"
bindr = SUPER, SPACE, exec, vox stop

Toggle mode

vox toggle   # start if stopped, stop if recording

CLI

vox status   # check daemon status
vox quit     # stop daemon
vox model list           # list available models
vox model pull <id>      # download a model
vox model set <id>       # switch active ASR model

Configuration

On first run the daemon writes a template to ~/.config/vox/config.toml with all options commented out. Edit it to enable what you need:

# Transcription model to use.
active_model = "parakeet-tdt-0.6b-v3-int8"

# Automatically paste after transcription (default: true).
auto_paste = true

# Number of CPU threads for inference (default: 4).
# intra_threads = 4

# Optional: microphone override.
# input_device = "Built-in Microphone"

# ── LLM post-processing ────────────────────────────────────────────────────
# Run LLM on every recording by default. Can also be triggered per-recording
# with `vox start --llm` regardless of this setting.
# process_transcription = false

# Prompt template. {text} is replaced with the raw transcript.
# process_prompt = "Clean up the following voice transcription: ..."

# ── Option A: Local ONNX model (private, offline) ──────────────────────────
# llm_model = "qwen3.5-0.8b-fp16"

# ── Option B: Cloud API ────────────────────────────────────────────────────

# Anthropic Claude
# llm_api_provider = "anthropic"
# llm_api_key = "sk-ant-..."
# llm_api_model = "claude-haiku-4-5"

# OpenAI
# llm_api_provider = "openai"
# llm_api_key = "sk-..."
# llm_api_model = "gpt-4o-mini"

# Google Gemini
# llm_api_provider = "gemini"
# llm_api_key = "AIza..."
# llm_api_model = "gemini-2.0-flash"

ASR Models

Models are stored in ~/.config/vox/models/ and downloaded on demand.

Available Models

Parakeet (NVIDIA) — High accuracy English ASR (default)

ID	Size	Notes
`parakeet-tdt-0.6b-v3-int8`	900 MB	Default, best accuracy/speed
`parakeet-tdt-0.6b-v3`	2.4 GB	FP32

Moonshine (UsefulSensors) — Ultra-fast streaming ASR

ID	Size	Notes
`moonshine-tiny`	110 MB	~25× faster than real-time
`moonshine-base`	245 MB	Better accuracy

SenseVoice (Alibaba) — Multilingual with emotion detection

ID	Size	Notes
`sensevoice-small`	937 MB	Chinese / Japanese / Korean / English

vox model list           # see all models + download status
vox model pull <id>      # download
vox model set <id>       # switch (restarts daemon)

LLM Post-processing

Vox can optionally clean up transcriptions through an LLM. The LLM worker runs in the background and is triggered either per-recording (--llm flag) or automatically for every recording (process_transcription = true).

Priority: local ONNX model › cloud API. If only one is configured, that one is used.

Option A — Local ONNX model (private)

No API key, no network access, runs on CPU.

# Download a model first
vox model pull qwen3.5-0.8b-fp16

# Then set in config.toml:
# llm_model = "qwen3.5-0.8b-fp16"

Available LLM models:

ID	Size	Notes
`qwen3.5-0.8b-fp16`	~2 GB	Best quality local option
`qwen3.5-0.8b-q4`	~670 MB	Quantized, faster
`qwen3.5-2b-q4`	~1.5 GB	Larger, more capable
`qwen3.5-4b-q4`	~2.5 GB	Largest local option
`gemma-3-270m-it-q4`	~800 MB	Google Gemma 3 270M
`gemma-3-270m-it-fp16`	~570 MB	Google Gemma 3 270M FP16
`gemma-3-1b-it-q4`	~1.7 GB	Google Gemma 3 1B

Option B — Cloud API

Set llm_api_key and optionally llm_api_provider and llm_api_model.

Provider	`llm_api_provider`	Default model	Key format
Anthropic Claude	`anthropic` (default)	`claude-haiku-4-5`	`sk-ant-...`
OpenAI	`openai`	`gpt-4o-mini`	`sk-...`
Google Gemini	`gemini`	`gemini-2.0-flash`	`AIza...`

Per-recording LLM override

# Use LLM for this recording (regardless of process_transcription setting)
vox start --llm

# Use a custom prompt for this recording
vox start --prompt "Summarise in one sentence: {text}"

Architecture

Audio: cpal for cross-platform audio capture
ASR inference: ONNX Runtime (ort)
Keyboard injection: evdev uinput (Linux) / enigo (macOS/Windows)
Clipboard: wl-copy (Wayland) / arboard (X11, macOS, Windows)
LLM inference: local ONNX backends (Qwen3.5, Gemma 3) or cloud REST APIs
IPC: newline-delimited JSON over a Unix domain socket
CLI: clap

Troubleshooting

Auto-paste not working

# Grant uinput access
sudo setcap "cap_dac_override+p" $(which vox-daemon)

Model download fails

vox model pull <id>   # retry download

LLM not running

Local model: check it is downloaded (vox model list) and llm_model is set in config
API: check the API key is correct and llm_api_provider matches the key type

No input device found

Check microphone permissions in system settings
Use vox device list to see available devices, then set input_device in config

License

MIT / Apache-2.0

Credits

cpal — Cross-platform audio capture
ort — ONNX Runtime Rust bindings
evdev — Linux input device access

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
icons		icons
install		install
src		src
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LINUX_README.md		LINUX_README.md
README.md		README.md
config.toml		config.toml
create_app_bundle.sh		create_app_bundle.sh
icon.icns		icon.icns
icon.png		icon.png
install_linux.sh		install_linux.sh
models.json		models.json
overlay.py		overlay.py
parakeet.py		parakeet.py
stop_vox.sh		stop_vox.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Vox - Voice-to-Text with One Keystroke

Features

Requirements

Linux System Dependencies

Installation

Linux (Quick Install)

Manual Installation

Usage

Push-to-talk (recommended)

Toggle mode

CLI

Configuration

ASR Models

Available Models

LLM Post-processing

Option A — Local ONNX model (private)

Option B — Cloud API

Per-recording LLM override

Architecture

Troubleshooting

Auto-paste not working

Model download fails

LLM not running

No input device found

License

Credits

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Vox - Voice-to-Text with One Keystroke

Features

Requirements

Linux System Dependencies

Installation

Linux (Quick Install)

Manual Installation

Usage

Push-to-talk (recommended)

Toggle mode

CLI

Configuration

ASR Models

Available Models

LLM Post-processing

Option A — Local ONNX model (private)

Option B — Cloud API

Per-recording LLM override

Architecture

Troubleshooting

Auto-paste not working

Model download fails

LLM not running

No input device found

License

Credits

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages