Skip to content

naver-ai/OpenSetSID

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Open-Set Speaker Identification Evaluation Set Generator

This repository provides tools to generate diverse evaluation sets for open-set speaker identification experiments using the VoxBlink2 dataset. From a pre-selected pool of 500 speakers, you can create multiple enrollment and evaluation sets by randomly sampling speakers.

Overview

Open-set speaker identification is a task where the system must identify a speaker from a set of enrolled speakers, while also detecting when a test utterance belongs to an unknown (non-enrolled) speaker. This tool enables reproducible experiments by generating multiple trials with different speaker subsets.

Files

File Description
make_evalset.py Script to generate enrollment and evaluation JSON files
spk_enroll_info.json Pre-selected enrollment utterances for 500 speakers
spk_eval_info.json Pre-selected evaluation utterances for 500 speakers

Requirements

  • Python 3.7+
  • No additional dependencies required (uses only standard library)

Usage

python make_evalset.py --num_speakers <N> [--num_trials <T>] [--seed <S>] [--output_dir <DIR>]

Arguments

Argument Required Default Description
--num_speakers Yes - Number of enrolled speakers (1-300)
--num_trials No 100 Number of evaluation sets to generate
--seed No 42 Random seed for reproducibility
--output_dir No evalsets Output directory

Example

# Generate 100 evaluation sets with 300 enrolled speakers
python make_evalset.py --num_speakers 300

# Generate 50 evaluation sets with 100 enrolled speakers
python make_evalset.py --num_speakers 100 --num_trials 50 --seed 123

Output Format

The script creates the following directory structure:

evalsets/
└── num_spk_<N>/
    ├── enroll_000.json
    ├── eval_000.json
    ├── enroll_001.json
    ├── eval_001.json
    ├── ...
    ├── enroll_<T-1>.json
    └── eval_<T-1>.json
  • <N>: Number of enrolled speakers (--num_speakers)
  • <T>: Number of trials (--num_trials, default: 100)

Enrollment JSON (enroll_XXX.json)

{
    "speaker_id_1": ["path/to/utt1.wav", "path/to/utt2.wav", ...],
    "speaker_id_2": ["path/to/utt1.wav", ...],
    ...
}

Evaluation JSON (eval_XXX.json)

[
    {
        "audio_path": "path/to/utterance.wav",
        "label": "speaker_id" or "unknown",
        "speaker_id": "actual_speaker_id"
    },
    ...
]
  • label: Speaker ID if enrolled, "unknown" if not enrolled
  • speaker_id: Ground truth speaker ID (for analysis purposes)

Audio Data

This evaluation protocol is designed for the VoxBlink2 dataset. You need to download the VoxBlink2 dataset separately and ensure the audio paths in the JSON files are correctly mapped to your local setup.

License

This project is licensed under the MIT License.

Dataset License

The VoxBlink2 dataset is licensed under CC BY-NC-SA 4.0. Please ensure compliance with the dataset license when using it for your research.

MIT License

Copyright (c) 2026-present NAVER Cloud Corp.

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.

References

VoxBlink2 Dataset

If you use this evaluation protocol, please cite the VoxBlink2 dataset:

@misc{lin2024voxblink2100kspeakerrecognition,
      title={VoxBlink2: A 100K+ Speaker Recognition Corpus and the Open-Set Speaker-Identification Benchmark}, 
      author={Yuke Lin and Ming Cheng and Fulin Zhang and Yingying Gao and Shilei Zhang and Ming Li},
      year={2024},
      eprint={2407.11510},
      archivePrefix={arXiv},
      primaryClass={eess.AS},
      url={https://arxiv.org/abs/2407.11510}, 
}

VoxBlink2 Resources:

Citation

If you use this evaluation set generator in your research, please cite:

@inproceedings{heo2026icassp,
    title={MITIGATING FALSE ALARMS IN OPEN-SET SPEAKER IDENTIFICATION WITH A DECOUPLED FRAMEWORK},
    author={Heo, Hee-Soo and Lee, Minjae and Kwon, Youngki and Kim, Han-Gyu and Lee, Bong-Jin},
    booktitle={Proc. ICASSP},
    year={2026}
}

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages