Skip to content

Torfinhell/HIFI_GAN

Repository files navigation

Automatic Speech Recognition (ASR) with PyTorch

AboutInstallationHow To UseCreditsLicense

About

This repository contains a template for solving ASR task with PyTorch. This template branch is a part of the HSE DLA course ASR homework. Some parts of the code are missing (or do not follow the most optimal design choices...) and students are required to fill these parts themselves (as well as writing their own models, etc.).

See the task assignment here.

Installation

Follow these steps to install the project:

  1. (Optional) Create and activate new environment using conda or venv (+pyenv).

    a. conda version:

    # create env
    conda create -n hifi_gan python=3.11
    # activate env
    conda activate hifi_gan
  2. Install all required packages

    pip install uv
    uv sync
  3. Install pre-commit:

    pre-commit install

How To Use

To download models checkpoints and test dataset run following:

!uv run download_gdrive.py

To train the model to reproduce first checkpoint, run the following command:

uv run train.py writer.project_name=HIFI_GAN trainer.override=True writer.run_name=HifiGanV1_100k_steps_batch_size_1 dataloader.train.batch_size=1 trainer.epoch_len=1000 dataloader.train.num_workers=8 dataloader.inference.num_workers=8 writer.mode=online datasets.inference.audio_limit=null datasets.inference.limit=100 trainer.n_epochs=100 model=hifi_gan_v1 -cn=hifi_gan

Where CONFIG_NAME is a config from src/configs and HYDRA_CONFIG_ARGUMENTS are optional arguments.

To run inference on first checkpoint with provided input folder:

!uv run synthesize.py \
    inferencer.save_path=output/hifi_gan_v1_prev \
    inferencer.from_pretrained=data/models/hifi_gan_first_v1_100k.pth \
    datasets.inference.transcription_dir=data/datasets/synthesize_text/transcriptions \
    -cn=synthesize_prev

To run inference on first checkpoint:

!uv run synthesize.py \
    inferencer.save_path=output/hifi_gan_v1_prev \
    inferencer.from_pretrained=data/models/hifi_gan_first_v1_100k.pth \
    datasets.inference.transcription_dir=data/datasets/synthesize_text/transcriptions \
    text="hello what is your name" \
    -cn=synthesize_prev

Credits

This repository is based on a PyTorch Project Template.

License

License

About

Implementation of HIFI-GAN from [paper](https://arxiv.org/pdf/2010.05646)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors