Convert Fairseq data2vec1 to HF

This code found in this repository was adapted from this original HuggingFace repository. This repository contains two scripts that convert a fairseq data2vec1 checkpoint to HuggingFace 🤗 Transformers.

Procedure

Create a HF repo :

huggingface-cli repo create <model-name> --organization <org_of_model>
git clone https://huggingface.co/<org_of_model>/<name_of_model>

Convert the model

./run_convert.sh \
    --hf-path </path/to/local/hf/repo> \
    --fairseq-path </path/to/fairseq/checkpoint> \
    --size {base, large} \
    [--dict </path/to/dict>] \
    [--copy-fairseq-model]

Verify models behave equally

./run_forward.py \
    --hf-path </path/to/local/hf/repo> \
    --fairseq-path </path/to/fairseq/checkpoint> \
    [--finetuned </path/to/dict>]

Push to hub

huggingface-cli upload <your-org>/<your-model> </path/to/local/hf/repo>

Changelog

convert_data2vec1_audio_original_pytorch_checkpoint_to_pytorch.py (originally from official huggingface /transformers) was modified.

The fairseq model is properly imported (and class name clashes are correctly handled)
sampling_rate and do_normalize are both extracted from the fairseq's original configuration (e.g. cfg['task']['sample_rate']) instead of being guessed.
Creates preprocessor_config.json which the original didn't do for pre-trained (i.e. non-finetuned) models
run_forward.py was adapted from the wav2vec2 conversion script

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
convert_data2vec1_audio_original_pytorch_checkpoint_to_pytorch.py		convert_data2vec1_audio_original_pytorch_checkpoint_to_pytorch.py
run_convert.sh		run_convert.sh
run_forward.py		run_forward.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Convert Fairseq data2vec1 to HF

Procedure

Changelog

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Convert Fairseq data2vec1 to HF

Procedure

Changelog

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages