A Python script that uses AssemblyAI to separate speakers from an audio file (like a podcast) into individual audio tracks. This is particularly useful for tasks like creating animations with NVIDIA's Audio2Face or analyzing individual speaker contributions.
This project was originally forked from an abandoned repository and has been significantly improved for better accuracy and usability, especially for non-English languages.
- Speaker Diarization: Identifies and separates different speakers in an audio file.
- Multi-Language Support: Works with various languages supported by AssemblyAI, with improved accuracy for specified languages (e.g., Hindi).
- Simple Command-Line Interface: Easy to use with just a few command-line arguments.
- Flexible Output: Generates separate WAV files for each speaker, preserving the original timeline with silence.
The script leverages the power of the AssemblyAI API for its core intelligence.
- Upload & Transcribe: The audio file is uploaded to AssemblyAI.
- Speaker Diarization: AssemblyAI processes the audio to detect who spoke and when, returning precise timestamps for each utterance.
- Audio Slicing: The script uses the
pydublibrary to slice the original audio file based on these timestamps. - Export: It creates a separate audio track for each speaker, filling the non-speaking parts with silence to maintain the original timing, and exports them as
.wavfiles.
- Python 3.x
- An AssemblyAI API Key. You can get one for free from the AssemblyAI website.
-
Clone the repository (or download the files):
# If you are using git git clone https://github.com/SiaLabs/speaker-separation.git cd speaker-separation
-
Install the required Python packages:
pip install -r requirements.txt
-
Set up your environment variables: Create a file named
.envin the project root and add your AssemblyAI API key:ASSEMBLYAI_API_KEY="your_api_key_here"
Run the script from your terminal, providing the filename, number of speakers, and optionally the language.
python speaker_separator.py --filename="path/to/your/audio.wav" --numspeakers=2 --language="hi"--filename: The path to your audio file (MP3 or WAV).--numspeakers: The number of speakers in the audio.--language: (Optional) The language code of the audio (e.g.,enfor English,hifor Hindi). Providing this improves accuracy. See the list of supported languages on the AssemblyAI website.
The output files will be saved in the output/ directory with descriptive names like your_audio_speaker_A.wav and your_audio_speaker_B.wav.
This project is licensed under the MIT License - see the LICENSE file for details.