Skip to content

Torfinhell/AVSS

 
 

Repository files navigation

Automatic Speech Recognition (ASR) with PyTorch

AboutInstallationHow To UseCreditsLicense

About

Implementation of the AVSS models based on the articles:

RTFS-NET: RECURRENT TIME-FREQUENCY MODELLING FOR EFFICIENT AUDIO-VISUAL SPEECH SEPARATION.

DUAL-PATH RNN: EFFICIENT LONG SEQUENCE MODELING FOR TIME-DOMAIN SINGLE-CHANNEL SPEECH SEPARATION.

Installation

  1. Install uv

    pip install uv
  2. Install all required packages

    uv init
    uv sync
  3. Download all required models and dataset:

    uv run scripts/download_gdrive.py

How To Use

Before using model should make the env YANDEX_DISK_URL global depending on the dataset you want to download from yandex disk(example with our dataset):

   export YANDEX_DISK_URL=https://disk.360.yandex.ru/d/5pz96ysIZi33IQ

To train our best rtfs model, run the following command:

   uv run train.py model=rtfs-4-reuse -cn=rtfs-net

To train our best dprnn model, run the following command:

   uv run train.py -cn=dprnn

(more details in report) To run inference on our best rtfs checkpoint(rtfs-3-reuse): Using Yandex disk dataset(need to export, see above):

   uv run inference.py inferencer.save_path="pred_small_av" inferencer.from_pretrained="data/models/rtfs-3-reuse.pth" download_name=dla_dataset_small_av model=rtfs-3-reuse -cn=inference

Inference our best rtfs checkpoint on downloaded dataset(YOUR_FOLDER should be in data/datasets folder):

   uv run inference.py inferencer.save_path=PRED_FOLDER_NAME inferencer.from_pretrained="data/models/rtfs-3-reuse.pth" download_name=YOUR_FOLDER model=rtfs-3-reuse -cn=inference

Credits

This repository is based on a PyTorch Project Template.

License

License

About

Audio-Visual Source Separation group porject

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Python 77.5%
  • Jupyter Notebook 22.5%