Listenr is a privacy-first tool for collecting real-world audio and high-quality transcriptions, designed to help build better automatic speech recognition (ASR) models. All processing runs locally on your hardware via Lemonade Server — no audio or text leaves your machine.
- Local-only, private by design. No cloud APIs. All inference runs on your CPU, GPU, or NPU via Lemonade Server.
- Open models. Uses Whisper.cpp for transcription and any GGUF-compatible LLM for post-processing correction.
- Automatic correction pipeline. A local LLM cleans up punctuation, grammar, and homophones — producing a higher-quality training corpus than raw Whisper output alone.
- Real-world data. Collects natural, conversational speech in realistic environments.
- Dataset-ready output. Every utterance is saved with its audio clip, a per-clip JSON, and appended to a single
manifest.jsonl. One command builds train/dev/test splits.
- Capture.
listenrstreams your microphone to Lemonade's/realtimeWebSocket in ~85 ms chunks. Audio is captured at the device's native rate and resampled to 16 kHz before sending. - VAD. Lemonade's built-in server-side voice activity detection segments speech boundaries automatically.
- Transcribe. Lemonade runs Whisper.cpp on each speech segment and streams back interim and final transcripts.
- Correct (optional). The final transcript is sent to a local LLM via Lemonade's chat completions API. The LLM returns a cleaned transcript, an
is_improvedflag, and contentcategories. - Save. Each utterance is saved as a
.wavclip and appended tomanifest.jsonl. - Build dataset.
build_dataset.pyreads the manifest and writes train/dev/test CSV splits.
- Lemonade Server running on
localhost:8000 - Python 3.13+ with
uv(recommended) orpip - A microphone accessible via PipeWire or ALSA
git clone https://github.com/Rebreda/listenr
cd listenr
uv pip install -e .Then run commands via uv run (no activation needed):
uv run listenrOr activate the venv once per session:
source .venv/bin/activate
listenrlemonade-server serveListenr will automatically call POST /api/v1/load on startup to load the configured models. On first use, Lemonade will download them.
# Record and save everything (default)
uv run listenr
# Don't save to disk — just print transcriptions
uv run listenr --no-save
# Also print the raw Whisper output before LLM correction
uv run listenr --show-raw
# Verbose debug output (WebSocket messages, mic RMS, etc.)
uv run listenr --debugExample output:
🎤 Listenr CLI — streaming to Lemonade
Model : Whisper-Large-v3-Turbo
WS URL : ws://localhost:9000/realtime?model=Whisper-Large-v3-Turbo
LLM : enabled (gpt-oss-20b-mxfp4-GGUF)
Save : yes → ~/.listenr/audio_clips
Press Ctrl+C to stop.
[ASR] I'm going to the store to buy some milk. [dictation]
[SAVED] ~/.listenr/audio_clips/audio/2026-02-28/clip_2026-02-28_abc123.wav (2.4s)
Press Ctrl+C to stop. Listenr will unload all models from Lemonade before exiting.
After collecting recordings, generate train/dev/test splits from manifest.jsonl:
# Default: 80/10/10 CSV splits in ~/listenr_dataset/
uv run listenr-build-dataset
# Custom output directory and split ratio
uv run listenr-build-dataset --output ~/my_dataset --split 90/5/5
# Exclude very short clips
uv run listenr-build-dataset --min-duration 1.0
# HuggingFace datasets format
uv run listenr-build-dataset --format hf
# Preview stats without writing files
uv run listenr-build-dataset --dry-runOutput CSV columns: uuid, split, audio_path, raw_transcription, corrected_transcription, is_improved, categories, duration_s, sample_rate, whisper_model, llm_model, timestamp.
Transcribe a single audio file:
python -m listenr.unified_asr --audio path/to/audio.wav --whisper-model Whisper-Large-v3-Turbo
# With LLM correction
python -m listenr.unified_asr --llm --audio path/to/audio.wavConfig is created with defaults at ~/.config/listenr/config.ini on first run.
python -c "import sounddevice as sd; [print(f'{i}: {d[\"name\"]}') for i, d in enumerate(sd.query_devices()) if d['max_input_channels'] > 0]"Set input_device to the device name (partial match works) or its index number.
| Goal | Setting |
|---|---|
| Shorter segments | Lower silence_duration_ms (e.g. 500) |
| Avoid cutting off speech | Raise silence_duration_ms (e.g. 1200) |
| Ignore background noise | Raise threshold (e.g. 0.05) |
| Capture quiet speech | Lower threshold (e.g. 0.005) |
One JSON object per line — append-only, easy to query:
# All improved clips
jq 'select(.is_improved == true)' ~/.listenr/audio_clips/manifest.jsonl
# Clips tagged as commands
jq 'select(.categories[] == "command")' ~/.listenr/audio_clips/manifest.jsonl
# Load into pandas
python -c "import pandas as pd; df = pd.read_json('~/.listenr/audio_clips/manifest.jsonl', lines=True); print(df.head())"No transcriptions appear / [SAVE SKIPPED] pcm_buffer is empty
- Check that Lemonade is running:
curl http://localhost:8000/api/v1/health - Run with
--debugto see mic RMS values and WebSocket messages - If RMS stays near
0.000, yourinput_deviceis wrong — list devices and update config (see above) - Lower
thresholdin[VAD]if your mic is quiet
LLM correction not working / model answers the transcription instead of fixing it
- Confirm
LLM.enabled = trueand the model name matches one loaded in Lemonade - Check
curl http://localhost:8000/api/v1/modelsto see loaded models - LLM errors are non-fatal — the raw transcript is saved regardless
Could not discover Lemonade websocket port
Lemonade is not running or not reachable on port 8000. Run lemonade-server serve first.
Too many / too few segments
Adjust [VAD] silence_duration_ms and threshold in your config.
| Model | Type | Notes |
|---|---|---|
Whisper-Tiny |
ASR | Fast, lower accuracy |
Whisper-Large-v3-Turbo |
ASR | Best accuracy |
gpt-oss-20b-mxfp4-GGUF |
LLM | Good correction quality |
Gemma-3-4b-it-GGUF |
LLM | Lighter alternative |
DeepSeek-Qwen3-8B-GGUF |
LLM | Lighter alternative |
List all models available on your Lemonade instance:
curl -s http://localhost:8000/api/v1/models | python3 -c "import sys,json; [print(m['id']) for m in json.load(sys.stdin)['data']]"Mozilla Public License Version 2.0 — see LICENSE.
- Lemonade Server — unified local inference API
- whisper.cpp — fast local ASR
- llama.cpp — fast local LLMs
