A macOS desktop app that records meetings, transcribes them, and produces structured notes — with action items pushed directly as GitHub issues and notes committed to a git repository.
Built with Tauri + Svelte 5 on the frontend, Rust on the backend.
- Records two audio streams in parallel — local mic and remote source (system audio via BlackHole) — with live dual VU meters
- Transcribes live both streams in real time using local whisper.cpp with Metal GPU acceleration — no audio leaves your machine. VAD (Silero) detects speech boundaries and feeds chunks to Whisper as you speak, so the transcript appears during the meeting.
- Merges the two transcripts into a single time-sorted conversation, each segment labelled with the correct speaker name (one stream = one speaker, no diarization needed)
- Summarizes with Claude Sonnet (Anthropic) or any Together.ai chat model, using a context you define (meeting type, participants, domain vocabulary)
- Exports structured Markdown notes (
YYYY-MM-DD_HHmm_Context.md) to a configurable folder - Commits the notes to a git repository automatically (optional)
- Creates GitHub issues for every action item extracted from the notes (optional)
Privacy: All audio is processed locally on-device. Only the final text transcript is sent to the cloud for summarization (Anthropic or Together.ai). A Groq Whisper cloud fallback is available in settings if preferred.
Each meeting produces two files:
Meetings/
2026-03-20_1430_welqin.md ← structured notes
2026-03-20_1430_welqin_transcript.md ← raw transcript with named speakers
The notes follow a consistent structure:
## Participants
## Summary
## Key Discussion Points
## Decisions Made
## Action Items
- [ ] **David**: Review the API design document
- [ ] **Yannick**: Set up CI pipeline for stagingAction items are automatically created as GitHub issues with the meeting-action label.
- macOS Apple Silicon (M1 or later — Metal GPU required for local Whisper)
- BlackHole 2ch — virtual audio driver for capturing system audio (see setup guide)
cmake— required to build whisper.cpp (brew install cmake)- One of (for summarization):
- Anthropic API key — to use Claude Sonnet
- Together.ai API key — to use open-source models (Llama, etc.)
ghCLI installed and authenticated — for GitHub issues (optional)git— for committing notes (optional)- Groq API key — only if using cloud transcription fallback (optional)
The Whisper model (
large-v3-turbo, quantized q5_0, ~550 MB) is downloaded automatically on first launch to~/.sidekick2000/models/.
npm installLaunch the app and click the gear icon in the top-right corner. All settings are stored in ~/.sidekick2000/settings.json.
| Tab | What to configure |
|---|---|
| API Keys | Transcription mode (Local Whisper or Groq); summarization provider (Claude or Together.ai) and the corresponding API key/model |
| Devices | Local mic device + your speaker name; remote source device + remote speaker name |
| Repository | Working folder (git root), meetings subfolder, GitHub repo (owner/repo), default language, pipeline step toggles |
| Contexts | Meeting context templates — instructions that shape how the AI summarizes each meeting type |
| Speakers | Default meeting attendees pre-loaded at startup (for AI context) |
API keys set in Settings take priority over environment variables. You can still use a
.envfile as fallback.
npm run tauri devnpm run tauri buildContexts are the core of Sidekick2000's flexibility. Each context is a Markdown document that gives Claude background knowledge about a meeting type: who the participants are, domain vocabulary, and how to structure the notes.
Examples of contexts you might create:
- General — neutral instructions, works for any meeting
- Product review — focus on decisions, feature requests, backlog items
- Client call — highlight commitments, risks, next steps
- Training session — track exercises, Q&A, shortcuts mentioned
Contexts are managed entirely in the Settings UI (no external files needed).
Record local mic ─────────────────┐ (parallel, ring buffers)
Record remote source ───────────────┤
│
┌───────────────┴───────────────┐
▼ ▼
Worker thread Worker thread
(drain every 200ms) (drain every 200ms)
│ │
▼ ▼
VAD (Silero) VAD (Silero)
silence ≥ 300ms → flush silence ≥ 300ms → flush
│ │
▼ ▼
Whisper (Metal) Whisper (Metal)
whisper.cpp local whisper.cpp local
│ │
▼ ▼
emit "live-segment" emit "live-segment"
→ frontend display → frontend display
└───────────────┬───────────────┘
│
▼
Merge — sort by timestamp, speakers already known
│
▼
Summarize (Claude Sonnet or Together.ai — skipped if disabled)
│
▼
Export YYYY-MM-DD_HHmm_Context.md
│
▼
Git commit (if enabled and working folder configured)
│
▼
Create GitHub issues (if enabled and repo configured)
~/.sidekick2000/settings.json — created automatically on first save.
{
"transcription_mode": "LocalWhisper",
"groq_api_key": "",
"anthropic_api_key": "sk-ant-...",
"together_ai_api_key": "",
"summarization_provider": "claude",
"together_ai_model": "meta-llama/Llama-3.3-70B-Instruct-Turbo",
"default_input_device": "MacBook Pro Microphone",
"local_speaker_name": "David",
"remote_device": "BlackHole 2ch",
"remote_speaker_name": "Remote",
"working_folder": "/Users/you/my-repo",
"github_repo": "owner/repo",
"meetings_subfolder": "Meetings",
"default_language": "fr",
"enable_summary": true,
"enable_git_commit": true,
"enable_github_issues": true,
"default_speakers": [
{ "name": "Alice", "organization": "Acme" }
],
"contexts": [
{
"id": "general",
"label": "General",
"content": "Be factual. Group by theme. Use professional tone."
}
]
}| Layer | Technology |
|---|---|
| UI | Svelte 5, Tailwind CSS 4 |
| Desktop shell | Tauri 2 |
| Backend | Rust (async with Tokio) |
| Transcription | Local whisper.cpp via whisper-rs (large-v3-turbo q5_0, Metal GPU). Groq Whisper API as optional cloud fallback. |
| VAD | Silero VAD via voice_activity_detector (ONNX) — detects speech/silence for live chunking |
| Summarization | Anthropic Claude Sonnet or Together.ai (configurable) |
| Speaker identification | Device-based — each stream has a pre-assigned speaker name |
| Audio capture | CPAL (two simultaneous input streams, ring buffers, shared t=0 origin) |
| GitHub integration | gh CLI |