Minimal macOS push-to-talk speech-to-text app scaffolded for the architecture described in the product and hardware research docs.
- Native macOS menu bar app shell using
SwiftUI MenuBarExtra - Clean seams between hardware input, audio capture, transcription, and UI orchestration
- Global hold-space push-to-talk with short-tap space passthrough
- Mac microphone capture via
AVFoundation - Deepgram live transcription over WebSocket
- Keychain-backed Deepgram API key storage
- Auto-insert of final transcript into the currently focused chat or text field
- Future-ready interfaces for USB HID and USB Audio replacements
- Coordinator tests for the session state machine
- Deepgram message decoding tests
- File-backed audio source test for integration-style coverage
Sources/SpeechBarDomainShared protocols, events, audio descriptors, and state enums.Sources/SpeechBarApplicationVoiceSessionCoordinatorand the app-level orchestration logic.Sources/SpeechBarInfrastructureOn-screen push-to-talk source, global space-key capture, mac microphone capture, Deepgram client, Keychain, and transcript delivery implementations.Sources/SpeechBarAppMenu bar app shell and the UI.
The app never hardcodes a Deepgram API key in source or plist files.
- Launch the app.
- Paste your Deepgram key into the secure field.
- Save it to Keychain.
The key shared in chat should be treated as exposed. Rotate it before any real use.
This workspace currently has Swift command line tools but not a full Xcode installation.
cd /Users/lixingting/Desktop/StartUp/Code
swift buildcd /Users/lixingting/Desktop/StartUp/Code
./Scripts/build_app_bundle.sh
open ./dist/SlashVibe.appcd /Users/lixingting/Desktop/StartUp/Code
./Scripts/package_release.shThat creates a distributable zip in release/.
To use global hold-space and automatic transcript insertion into another app:
- Allow microphone access
- Allow Accessibility access
- If macOS asks, allow Input Monitoring for the app so it can watch the space key globally
cd /Users/lixingting/Desktop/StartUp/Code
swift test- Default Deepgram model is
nova-2withlanguage=zh-CN. - Audio is normalized to
16kHz / mono / linear16before streaming. - KeepAlive messages are sent every 4 seconds.
- Short press on
Spacestill types a normal space. Long press onSpacestarts voice capture. - Replacing the on-screen button with a future USB HID input should only require a new
HardwareEventSource.
- Install and sharing guide: INSTALL.md
- GitHub upload guide: GITHUB_SETUP.md
小红书黑客松巅峰赛 · GitHub Topic redhackathon