A cross-platform desktop application built with Electron that provides real-time voice translation during voice calls. The application captures audio from microphones, translates speech in real-time, and outputs the translated audio through a virtual microphone that other applications can use.
- Real-time audio capture from selected microphone devices
- Speech-to-text processing using OpenAI Whisper API
- Text translation with OpenAI GPT models
- Text-to-speech synthesis using ElevenLabs API
- Virtual microphone output for integration with voice call applications
- Test mode that outputs audio to headphones for verification
- Professional UI with modern design and animations
- Device selection (microphones with real-time detection)
- Language selection (20+ languages with flag icons)
- Voice selection from ElevenLabs voice library
- Settings modal for API key management
- Debug console for real-time monitoring
- Status indicators with visual feedback
- Persistent settings storage across app restarts
- API key validation and secure storage
- Device preferences with automatic detection
- Audio quality and processing settings
- Click "🧪 Test Translation"
- Translates sample text: "Hello, this is a test" → Target language
- Plays translated audio through your headphones/speakers
- Perfect for testing the pipeline before going live
- Click "
▶️ Start Translation" - Speak into your selected microphone
- Audio is processed through the complete pipeline:
- Audio Capture → Speech-to-Text → Translation → Text-to-Speech
- Translated audio is sent to virtual microphone
- Other apps (Zoom, Teams, Discord) can use the virtual microphone as input
- 🎧 Hear Yourself: Records 3 seconds from your mic and plays it back
- 📢 Test Virtual Mic: Sends test audio to virtual microphone for other apps
- Node.js 18+ and npm
- OpenAI API key (for speech-to-text and translation)
- ElevenLabs API key (for text-to-speech)
- Microphone access permissions
npm installnpm run buildnpm run dev:simple- Click "⚙️ Settings" button
- Enter your OpenAI API key
- Enter your ElevenLabs API key
- Click "Save Settings"
- Select microphone from dropdown
- Choose target language (e.g., Spanish, French, etc.)
- Pick a voice for text-to-speech output
- Test the system using the test buttons
- Go to https://platform.openai.com/
- Create account and generate API key
- Used for: Speech-to-text (Whisper) and Translation (GPT)
- Go to https://elevenlabs.io/
- Create account and generate API key
- Used for: Text-to-speech synthesis
🧪 Test Translation → Verifies complete translation pipeline
🎧 Hear Yourself → Tests microphone input (3-second recording)
📢 Test Virtual Mic → Tests virtual microphone output
▶️ Start Translation → Begins real-time processing
⏹️ Stop Translation → Stops processing
- Open your video call app (Zoom, Teams, Discord, etc.)
- Select "Virtual Microphone Output" as your microphone
- Speak in your language → Others hear the translation
- Monitor the debug console for real-time status
- ProcessingOrchestrator: Manages the complete translation pipeline
- AudioCaptureService: Handles microphone input and audio processing
- TranslationServiceManager: Manages OpenAI translation requests
- TextToSpeechManager: Handles ElevenLabs voice synthesis
- VirtualMicrophoneManager: Manages audio output routing
- ConfigurationManager: Handles settings and API key storage
Microphone Input → Audio Capture → Speech-to-Text → Translation → Text-to-Speech → Virtual Microphone Output
- Test Mode: Audio → System Speakers (for testing)
- Live Mode: Audio → Virtual Microphone (for other apps)
- Solution: Grant microphone permissions in system settings
- Windows: Settings → Privacy → Microphone
- macOS: System Preferences → Security & Privacy → Microphone
- Solution: Verify API keys are correct and have sufficient credits
- Check OpenAI account: https://platform.openai.com/usage
- Check ElevenLabs account: https://elevenlabs.io/subscription
- Solution: Check virtual microphone setup
- Try "📢 Test Virtual Mic" button
- Verify other apps can see "Virtual Microphone Output" device
- Solution: Use debug console to identify issues
- Click "Show Debug Console" to see real-time logs
- Verify all API keys are configured correctly
The debug console shows real-time information:
- API requests and responses
- Audio processing status
- Error messages and warnings
- Performance metrics
src/
├── services/ # Core business logic
│ ├── ProcessingOrchestrator.ts # Main pipeline coordinator
│ ├── AudioCaptureService.ts # Microphone input handling
│ ├── TranslationServiceManager.ts # OpenAI translation
│ ├── TextToSpeechManager.ts # ElevenLabs TTS
│ ├── VirtualMicrophoneManager.ts # Audio output routing
│ └── ConfigurationManager.ts # Settings management
├── ui/ # User interface components
├── ipc/ # Inter-process communication
├── types/ # TypeScript definitions
├── main.ts # Electron main process
├── renderer.ts # UI process
└── index.html # Application interface
# Development with hot reload
npm run dev
# Simple development build and run
npm run dev:simple
# Production build
npm run build
# Watch mode for development
npm run build:watch
# Clean build artifacts
npm run clean
# Run tests
npm testThe Real-Time Voice Translator is now fully functional with:
✅ Complete translation pipeline (Audio → Text → Translation → Speech → Output)
✅ Dual output modes (Test to headphones, Live to virtual microphone)
✅ Professional UI with real-time status monitoring
✅ Comprehensive testing tools for verification
✅ Persistent configuration with API key management
Start translating in real-time today! 🌍🎙️
MIT License - see LICENSE file for details