-
Notifications
You must be signed in to change notification settings - Fork 9
Initial roadmap #1
Copy link
Copy link
Open
Labels
Description
Training recipes
- BVCC (24/05/14)
- SOMOS
- NISQA
- TMHINT-QI
- SingMOS (VMC'24 track2)
- PSTN
- Tencent
Benchmarks
- VMC'22 OOD track - having difficulty downloading from BC server now...
- VMC'23
- Zoomed-in BVCC (VMC'24 track1)
Classic models
Non-intrusive (single-ended)
- LDNet https://github.com/unilight/LDNet (24/05/14)
- SSL-MOS https://github.com/nii-yamagishilab/mos-finetune-ssl (24/05/22)
- UTMOS https://github.com/sarulab-speech/UTMOS22 (24/06/12)
- RAMP https://arxiv.org/abs/2308.16488
Do I want to implement these methods?
Intrusive (double-ended)
Experimental features
Output
- continuous with L1/L2 loss
- discrete (categorial) with cross-entropy loss
Input feature
- semantic SSL
- Linguistic representation from ASR
- Whisper PPG
- sxliu PPG
- audio codec
- general audio representation
- Supported in S3PRL
- SSAST
- CLAP https://github.com/microsoft/CLAP
- Audio-MAE https://github.com/facebookresearch/AudioMAE
- BEATS https://github.com/microsoft/unilm/tree/master/beats
- Supported in S3PRL
Improvements
- In training loop, automatically save models with good results on dev set
- Model ensemble
- Model averaging
- Inference with outside pre-trained models
- Upload/download/inference with, models trained in this toolkit (to where? HuggingFace?)
Reactions are currently unavailable