Automatic Speech Recognition (ASR) with PyTorch

About • Installation • How To Use • Credits • License

About

Implementation of the AVSS models based on the articles:

RTFS-NET: RECURRENT TIME-FREQUENCY MODELLING FOR EFFICIENT AUDIO-VISUAL SPEECH SEPARATION.

DUAL-PATH RNN: EFFICIENT LONG SEQUENCE MODELING FOR TIME-DOMAIN SINGLE-CHANNEL SPEECH SEPARATION.

Installation

Install uv
```
pip install uv
```
Install all required packages
```
uv init
uv sync
```
Download all required models and dataset:
```
uv run scripts/download_gdrive.py
```

How To Use

Before using model should make the env YANDEX_DISK_URL global depending on the dataset you want to download from yandex disk(example with our dataset):

   export YANDEX_DISK_URL=https://disk.360.yandex.ru/d/5pz96ysIZi33IQ

To train our best rtfs model, run the following command:

   uv run train.py model=rtfs-4-reuse -cn=rtfs-net

To train our best dprnn model, run the following command:

   uv run train.py -cn=dprnn

(more details in report) To run inference on our best rtfs checkpoint(rtfs-3-reuse): Using Yandex disk dataset(need to export, see above):

   uv run inference.py inferencer.save_path="pred_small_av" inferencer.from_pretrained="data/models/rtfs-3-reuse.pth" download_name=dla_dataset_small_av model=rtfs-3-reuse -cn=inference

Inference our best rtfs checkpoint on downloaded dataset(YOUR_FOLDER should be in data/datasets folder):

   uv run inference.py inferencer.save_path=PRED_FOLDER_NAME inferencer.from_pretrained="data/models/rtfs-3-reuse.pth" download_name=YOUR_FOLDER model=rtfs-3-reuse -cn=inference

Credits

This repository is based on a PyTorch Project Template.

Name		Name	Last commit message	Last commit date
Latest commit History 214 Commits
scripts		scripts
src		src
.flake8		.flake8
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
calc_metrics.py		calc_metrics.py
demo.ipynb		demo.ipynb
inference.py		inference.py
pyproject.toml		pyproject.toml
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Automatic Speech Recognition (ASR) with PyTorch

About

Installation

How To Use

Credits

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Automatic Speech Recognition (ASR) with PyTorch

About

Installation

How To Use

Credits

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages