This repository provides tools to generate diverse evaluation sets for open-set speaker identification experiments using the VoxBlink2 dataset. From a pre-selected pool of 500 speakers, you can create multiple enrollment and evaluation sets by randomly sampling speakers.
Open-set speaker identification is a task where the system must identify a speaker from a set of enrolled speakers, while also detecting when a test utterance belongs to an unknown (non-enrolled) speaker. This tool enables reproducible experiments by generating multiple trials with different speaker subsets.
| File | Description |
|---|---|
make_evalset.py |
Script to generate enrollment and evaluation JSON files |
spk_enroll_info.json |
Pre-selected enrollment utterances for 500 speakers |
spk_eval_info.json |
Pre-selected evaluation utterances for 500 speakers |
- Python 3.7+
- No additional dependencies required (uses only standard library)
python make_evalset.py --num_speakers <N> [--num_trials <T>] [--seed <S>] [--output_dir <DIR>]| Argument | Required | Default | Description |
|---|---|---|---|
--num_speakers |
Yes | - | Number of enrolled speakers (1-300) |
--num_trials |
No | 100 | Number of evaluation sets to generate |
--seed |
No | 42 | Random seed for reproducibility |
--output_dir |
No | evalsets |
Output directory |
# Generate 100 evaluation sets with 300 enrolled speakers
python make_evalset.py --num_speakers 300
# Generate 50 evaluation sets with 100 enrolled speakers
python make_evalset.py --num_speakers 100 --num_trials 50 --seed 123The script creates the following directory structure:
evalsets/
└── num_spk_<N>/
├── enroll_000.json
├── eval_000.json
├── enroll_001.json
├── eval_001.json
├── ...
├── enroll_<T-1>.json
└── eval_<T-1>.json
<N>: Number of enrolled speakers (--num_speakers)<T>: Number of trials (--num_trials, default: 100)
{
"speaker_id_1": ["path/to/utt1.wav", "path/to/utt2.wav", ...],
"speaker_id_2": ["path/to/utt1.wav", ...],
...
}[
{
"audio_path": "path/to/utterance.wav",
"label": "speaker_id" or "unknown",
"speaker_id": "actual_speaker_id"
},
...
]label: Speaker ID if enrolled,"unknown"if not enrolledspeaker_id: Ground truth speaker ID (for analysis purposes)
This evaluation protocol is designed for the VoxBlink2 dataset. You need to download the VoxBlink2 dataset separately and ensure the audio paths in the JSON files are correctly mapped to your local setup.
This project is licensed under the MIT License.
The VoxBlink2 dataset is licensed under CC BY-NC-SA 4.0. Please ensure compliance with the dataset license when using it for your research.
MIT License
Copyright (c) 2026-present NAVER Cloud Corp.
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
If you use this evaluation protocol, please cite the VoxBlink2 dataset:
@misc{lin2024voxblink2100kspeakerrecognition,
title={VoxBlink2: A 100K+ Speaker Recognition Corpus and the Open-Set Speaker-Identification Benchmark},
author={Yuke Lin and Ming Cheng and Fulin Zhang and Yingying Gao and Shilei Zhang and Ming Li},
year={2024},
eprint={2407.11510},
archivePrefix={arXiv},
primaryClass={eess.AS},
url={https://arxiv.org/abs/2407.11510},
}VoxBlink2 Resources:
- Paper: VoxBlink2: A 100K+ Speaker Recognition Corpus and the Open-Set Speaker-Identification Benchmark
- Dataset: https://voxblink2.github.io/
If you use this evaluation set generator in your research, please cite:
@inproceedings{heo2026icassp,
title={MITIGATING FALSE ALARMS IN OPEN-SET SPEAKER IDENTIFICATION WITH A DECOUPLED FRAMEWORK},
author={Heo, Hee-Soo and Lee, Minjae and Kwon, Youngki and Kim, Han-Gyu and Lee, Bong-Jin},
booktitle={Proc. ICASSP},
year={2026}
}