Automatic Speech Recognition (ASR) with PyTorch

About • Installation • How To Use • Credits • License

About

This repository contains a template for solving ASR task with PyTorch. This template branch is a part of the HSE DLA course ASR homework. Some parts of the code are missing (or do not follow the most optimal design choices...) and students are required to fill these parts themselves (as well as writing their own models, etc.).

See the task assignment here.

Installation

Follow these steps to install the project:

(Optional) Create and activate new environment using conda or venv (+pyenv).

a. conda version:

# create env
conda create -n hifi_gan python=3.11
# activate env
conda activate hifi_gan

Install all required packages
```
pip install uv
uv sync
```
Install pre-commit:
```
pre-commit install
```

How To Use

To download models checkpoints and test dataset run following:

!uv run download_gdrive.py

To train the model to reproduce first checkpoint, run the following command:

uv run train.py writer.project_name=HIFI_GAN trainer.override=True writer.run_name=HifiGanV1_100k_steps_batch_size_1 dataloader.train.batch_size=1 trainer.epoch_len=1000 dataloader.train.num_workers=8 dataloader.inference.num_workers=8 writer.mode=online datasets.inference.audio_limit=null datasets.inference.limit=100 trainer.n_epochs=100 model=hifi_gan_v1 -cn=hifi_gan

Where CONFIG_NAME is a config from src/configs and HYDRA_CONFIG_ARGUMENTS are optional arguments.

To run inference on first checkpoint with provided input folder:

!uv run synthesize.py \
    inferencer.save_path=output/hifi_gan_v1_prev \
    inferencer.from_pretrained=data/models/hifi_gan_first_v1_100k.pth \
    datasets.inference.transcription_dir=data/datasets/synthesize_text/transcriptions \
    -cn=synthesize_prev

To run inference on first checkpoint:

!uv run synthesize.py \
    inferencer.save_path=output/hifi_gan_v1_prev \
    inferencer.from_pretrained=data/models/hifi_gan_first_v1_100k.pth \
    datasets.inference.transcription_dir=data/datasets/synthesize_text/transcriptions \
    text="hello what is your name" \
    -cn=synthesize_prev

Credits

This repository is based on a PyTorch Project Template.

Name		Name	Last commit message	Last commit date
Latest commit History 60 Commits
src		src
.flake8		.flake8
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
analysis.ipynb		analysis.ipynb
demo.ipynb		demo.ipynb
download_gdrive.py		download_gdrive.py
pyproject.toml		pyproject.toml
synthesize.py		synthesize.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Automatic Speech Recognition (ASR) with PyTorch

About

Installation

How To Use

!uv run download_gdrive.py

Credits

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Automatic Speech Recognition (ASR) with PyTorch

About

Installation

How To Use

!uv run download_gdrive.py

Credits

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages