Skip to content

LLL-Orleans/convert_data2vec_to_hf

Repository files navigation

Convert Fairseq data2vec1 to HF

This code found in this repository was adapted from this original HuggingFace repository. This repository contains two scripts that convert a fairseq data2vec1 checkpoint to HuggingFace 🤗 Transformers.

Procedure

  1. Create a HF repo :
huggingface-cli repo create <model-name> --organization <org_of_model>
git clone https://huggingface.co/<org_of_model>/<name_of_model>
  1. Convert the model
./run_convert.sh \
    --hf-path </path/to/local/hf/repo> \
    --fairseq-path </path/to/fairseq/checkpoint> \
    --size {base, large} \
    [--dict </path/to/dict>] \
    [--copy-fairseq-model]
  1. Verify models behave equally
./run_forward.py \
    --hf-path </path/to/local/hf/repo> \
    --fairseq-path </path/to/fairseq/checkpoint> \
    [--finetuned </path/to/dict>]
  1. Push to hub
huggingface-cli upload <your-org>/<your-model> </path/to/local/hf/repo>

Changelog

convert_data2vec1_audio_original_pytorch_checkpoint_to_pytorch.py (originally from official huggingface /transformers) was modified.

  1. The fairseq model is properly imported (and class name clashes are correctly handled)

  2. sampling_rate and do_normalize are both extracted from the fairseq's original configuration (e.g. cfg['task']['sample_rate']) instead of being guessed.

  3. Creates preprocessor_config.json which the original didn't do for pre-trained (i.e. non-finetuned) models

  4. run_forward.py was adapted from the wav2vec2 conversion script

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors