Speaker Separation Tool

A Python script that uses AssemblyAI to separate speakers from an audio file (like a podcast) into individual audio tracks. This is particularly useful for tasks like creating animations with NVIDIA's Audio2Face or analyzing individual speaker contributions.

This project was originally forked from an abandoned repository and has been significantly improved for better accuracy and usability, especially for non-English languages.

Key Features

Speaker Diarization: Identifies and separates different speakers in an audio file.
Multi-Language Support: Works with various languages supported by AssemblyAI, with improved accuracy for specified languages (e.g., Hindi).
Simple Command-Line Interface: Easy to use with just a few command-line arguments.
Flexible Output: Generates separate WAV files for each speaker, preserving the original timeline with silence.

How It Works

The script leverages the power of the AssemblyAI API for its core intelligence.

Upload & Transcribe: The audio file is uploaded to AssemblyAI.
Speaker Diarization: AssemblyAI processes the audio to detect who spoke and when, returning precise timestamps for each utterance.
Audio Slicing: The script uses the pydub library to slice the original audio file based on these timestamps.
Export: It creates a separate audio track for each speaker, filling the non-speaking parts with silence to maintain the original timing, and exports them as .wav files.

Getting Started

Prerequisites

Python 3.x
An AssemblyAI API Key. You can get one for free from the AssemblyAI website.

Installation

Clone the repository (or download the files):

# If you are using git
git clone https://github.com/SiaLabs/speaker-separation.git
cd speaker-separation

Install the required Python packages:
```
pip install -r requirements.txt
```
Set up your environment variables: Create a file named .env in the project root and add your AssemblyAI API key:
```
ASSEMBLYAI_API_KEY="your_api_key_here"
```

Usage

Run the script from your terminal, providing the filename, number of speakers, and optionally the language.

python speaker_separator.py --filename="path/to/your/audio.wav" --numspeakers=2 --language="hi"

--filename: The path to your audio file (MP3 or WAV).
--numspeakers: The number of speakers in the audio.
--language: (Optional) The language code of the audio (e.g., en for English, hi for Hindi). Providing this improves accuracy. See the list of supported languages on the AssemblyAI website.

The output files will be saved in the output/ directory with descriptive names like your_audio_speaker_A.wav and your_audio_speaker_B.wav.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
sampledotenv		sampledotenv
speaker_separator.py		speaker_separator.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Speaker Separation Tool

Key Features

How It Works

Getting Started

Prerequisites

Installation

Usage

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Speaker Separation Tool

Key Features

How It Works

Getting Started

Prerequisites

Installation

Usage

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages