Real-Time Voice Translator

A cross-platform desktop application built with Electron that provides real-time voice translation during voice calls. The application captures audio from microphones, translates speech in real-time, and outputs the translated audio through a virtual microphone that other applications can use.

🎯 Key Features

✅ WORKING TRANSLATION PIPELINE

Real-time audio capture from selected microphone devices
Speech-to-text processing using OpenAI Whisper API
Text translation with OpenAI GPT models
Text-to-speech synthesis using ElevenLabs API
Virtual microphone output for integration with voice call applications
Test mode that outputs audio to headphones for verification

🎮 User Interface

Professional UI with modern design and animations
Device selection (microphones with real-time detection)
Language selection (20+ languages with flag icons)
Voice selection from ElevenLabs voice library
Settings modal for API key management
Debug console for real-time monitoring
Status indicators with visual feedback

⚙️ Configuration Management

Persistent settings storage across app restarts
API key validation and secure storage
Device preferences with automatic detection
Audio quality and processing settings

🚀 How Translation Works

Test Mode (Headphone Output)

Click "🧪 Test Translation"
Translates sample text: "Hello, this is a test" → Target language
Plays translated audio through your headphones/speakers
Perfect for testing the pipeline before going live

Real-Time Mode (Virtual Microphone)

Click "▶️ Start Translation"
Speak into your selected microphone
Audio is processed through the complete pipeline:
- Audio Capture → Speech-to-Text → Translation → Text-to-Speech
Translated audio is sent to virtual microphone
Other apps (Zoom, Teams, Discord) can use the virtual microphone as input

Additional Test Features

🎧 Hear Yourself: Records 3 seconds from your mic and plays it back
📢 Test Virtual Mic: Sends test audio to virtual microphone for other apps

📋 Prerequisites

Node.js 18+ and npm
OpenAI API key (for speech-to-text and translation)
ElevenLabs API key (for text-to-speech)
Microphone access permissions

🛠️ Installation & Setup

1. Install Dependencies

npm install

2. Build the Application

npm run build

3. Launch the Application

npm run dev:simple

4. Configure API Keys

Click "⚙️ Settings" button
Enter your OpenAI API key
Enter your ElevenLabs API key
Click "Save Settings"

5. Set Up Translation

Select microphone from dropdown
Choose target language (e.g., Spanish, French, etc.)
Pick a voice for text-to-speech output
Test the system using the test buttons

🔑 API Key Setup

OpenAI API Key

Go to https://platform.openai.com/
Create account and generate API key
Used for: Speech-to-text (Whisper) and Translation (GPT)

ElevenLabs API Key

Go to https://elevenlabs.io/
Create account and generate API key
Used for: Text-to-speech synthesis

🎮 Using the Application

Step 1: Test the Pipeline

🧪 Test Translation → Verifies complete translation pipeline
🎧 Hear Yourself → Tests microphone input (3-second recording)
📢 Test Virtual Mic → Tests virtual microphone output

Step 2: Start Real-Time Translation

▶️ Start Translation → Begins real-time processing
⏹️ Stop Translation → Stops processing

Step 3: Use with Other Applications

Open your video call app (Zoom, Teams, Discord, etc.)
Select "Virtual Microphone Output" as your microphone
Speak in your language → Others hear the translation
Monitor the debug console for real-time status

🏗️ Architecture Overview

Service Layer

ProcessingOrchestrator: Manages the complete translation pipeline
AudioCaptureService: Handles microphone input and audio processing
TranslationServiceManager: Manages OpenAI translation requests
TextToSpeechManager: Handles ElevenLabs voice synthesis
VirtualMicrophoneManager: Manages audio output routing
ConfigurationManager: Handles settings and API key storage

Audio Pipeline Flow

Microphone Input → Audio Capture → Speech-to-Text → Translation → Text-to-Speech → Virtual Microphone Output

Dual Output Modes

Test Mode: Audio → System Speakers (for testing)
Live Mode: Audio → Virtual Microphone (for other apps)

🐛 Troubleshooting

Common Issues & Solutions

❌ "Microphone access denied"

Solution: Grant microphone permissions in system settings
Windows: Settings → Privacy → Microphone
macOS: System Preferences → Security & Privacy → Microphone

❌ "API key validation failed"

Solution: Verify API keys are correct and have sufficient credits
Check OpenAI account: https://platform.openai.com/usage
Check ElevenLabs account: https://elevenlabs.io/subscription

❌ "No audio output"

Solution: Check virtual microphone setup
Try "📢 Test Virtual Mic" button
Verify other apps can see "Virtual Microphone Output" device

❌ "Translation not working"

Solution: Use debug console to identify issues
Click "Show Debug Console" to see real-time logs
Verify all API keys are configured correctly

Debug Console

The debug console shows real-time information:

API requests and responses
Audio processing status
Error messages and warnings
Performance metrics

📁 Project Structure

src/
├── services/           # Core business logic
│   ├── ProcessingOrchestrator.ts      # Main pipeline coordinator
│   ├── AudioCaptureService.ts         # Microphone input handling
│   ├── TranslationServiceManager.ts   # OpenAI translation
│   ├── TextToSpeechManager.ts         # ElevenLabs TTS
│   ├── VirtualMicrophoneManager.ts    # Audio output routing
│   └── ConfigurationManager.ts        # Settings management
├── ui/                 # User interface components
├── ipc/               # Inter-process communication
├── types/             # TypeScript definitions
├── main.ts            # Electron main process
├── renderer.ts        # UI process
└── index.html         # Application interface

🚀 Development Commands

# Development with hot reload
npm run dev

# Simple development build and run  
npm run dev:simple

# Production build
npm run build

# Watch mode for development
npm run build:watch

# Clean build artifacts
npm run clean

# Run tests
npm test

🎉 Ready to Use!

The Real-Time Voice Translator is now fully functional with:

✅ Complete translation pipeline (Audio → Text → Translation → Speech → Output)
✅ Dual output modes (Test to headphones, Live to virtual microphone)
✅ Professional UI with real-time status monitoring
✅ Comprehensive testing tools for verification
✅ Persistent configuration with API key management

Start translating in real-time today! 🌍🎙️

📄 License

MIT License - see LICENSE file for details

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
.kiro		.kiro
.vscode		.vscode
build		build
dist		dist
native-wasapi-loopback		native-wasapi-loopback
src		src
.cursorrules		.cursorrules
.gitignore		.gitignore
CRITICAL_TRANSLATION_FIX.md		CRITICAL_TRANSLATION_FIX.md
CUSTOM_VOICE_GUIDE.md		CUSTOM_VOICE_GUIDE.md
ENHANCEMENT_SUMMARY.md		ENHANCEMENT_SUMMARY.md
LANGUAGE_FILTERING_IMPLEMENTATION.md		LANGUAGE_FILTERING_IMPLEMENTATION.md
LIVE_TRANSLATION_GUIDE.md		LIVE_TRANSLATION_GUIDE.md
README.md		README.md
SETUP_GUIDE.md		SETUP_GUIDE.md
chatgpt-stream-rules.txt		chatgpt-stream-rules.txt
check-config.js		check-config.js
debug-api-keys.js		debug-api-keys.js
jest.config.js		jest.config.js
nodemon.json		nodemon.json
package-lock.json		package-lock.json
package.json		package.json
quick-diagnostic.js		quick-diagnostic.js
test-translation.js		test-translation.js
test-wasapi-loading.js		test-wasapi-loading.js
tsconfig.json		tsconfig.json
tsconfig.main.json		tsconfig.main.json
tsconfig.renderer.json		tsconfig.renderer.json

bluebillshtml/VoiceTranslationMod

Folders and files

Latest commit

History

Repository files navigation