About • Installation • How To Use • Credits • License
Implementation of the AVSS models based on the articles:
RTFS-NET: RECURRENT TIME-FREQUENCY MODELLING FOR EFFICIENT AUDIO-VISUAL SPEECH SEPARATION.
DUAL-PATH RNN: EFFICIENT LONG SEQUENCE MODELING FOR TIME-DOMAIN SINGLE-CHANNEL SPEECH SEPARATION.
-
Install uv
pip install uv
-
Install all required packages
uv init uv sync
-
Download all required models and dataset:
uv run scripts/download_gdrive.py
Before using model should make the env YANDEX_DISK_URL global depending on the dataset you want to download from yandex disk(example with our dataset):
export YANDEX_DISK_URL=https://disk.360.yandex.ru/d/5pz96ysIZi33IQTo train our best rtfs model, run the following command:
uv run train.py model=rtfs-4-reuse -cn=rtfs-netTo train our best dprnn model, run the following command:
uv run train.py -cn=dprnn(more details in report) To run inference on our best rtfs checkpoint(rtfs-3-reuse): Using Yandex disk dataset(need to export, see above):
uv run inference.py inferencer.save_path="pred_small_av" inferencer.from_pretrained="data/models/rtfs-3-reuse.pth" download_name=dla_dataset_small_av model=rtfs-3-reuse -cn=inferenceInference our best rtfs checkpoint on downloaded dataset(YOUR_FOLDER should be in data/datasets folder):
uv run inference.py inferencer.save_path=PRED_FOLDER_NAME inferencer.from_pretrained="data/models/rtfs-3-reuse.pth" download_name=YOUR_FOLDER model=rtfs-3-reuse -cn=inferenceThis repository is based on a PyTorch Project Template.