Merge pytorch/fairseq by imonlius · Pull Request #4 · imonlius/fairseq

imonlius · 2020-12-11T04:12:06Z

Before submitting

Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
Did you read the contributor guideline?
Did you make sure to update the docs?
Did you write any new necessary tests?

What does this PR do?

Fixes # (issue).

PR review

Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

Did you have fun?

Make sure you had fun coding 🙃

Summary: OSS removed the 'partition' key in their state dict to accommodate for changing partition size. This requires an update on the fairseq side to not look into the parameter partition, just broadcast everything, and let the optimizer on each rank decides which parameters are relevant. This diff also needs D26419095 to function completely, and blefaudeux has made fixes upstream in facebookresearch/fairscale#383 Reviewed By: myleott Differential Revision: D26382917 fbshipit-source-id: 95af1022be59e88814748acaee36a1a350f7dc5b

…s. (#3237) Summary: …ith BLEU scores # Before submitting - [no] Was this discussed/approved via a Github issue? (no need for typos, doc improvements) - [yes] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/master/CONTRIBUTING.md)? - [no need] Did you make sure to update the docs? - [no need] Did you write any new necessary tests? ## What does this PR do? Fixes bugs of evaluation with BLEU score when training with multi-gpus. But no error will happend if there is no distributed training. when --eval-bleu is set to be `True` (default it is `False` and the best checkpoint is selected according to loss) and training with multi-gpus (when the number of gpu which participate in distributed training is greater than 1), following error will happend. ```bash Traceback (most recent call last): Traceback (most recent call last): File "/data/cordercorder/anaconda3/envs/nmt/bin/fairseq-train", line 33, in <module> File "/data/cordercorder/anaconda3/envs/nmt/bin/fairseq-train", line 33, in <module> Traceback (most recent call last): File "/data/cordercorder/anaconda3/envs/nmt/bin/fairseq-train", line 33, in <module> sys.exit(load_entry_point('fairseq', 'console_scripts', 'fairseq-train')())sys.exit(load_entry_point('fairseq', 'console_scripts', 'fairseq-train')()) File "/data1/cordercorder/fairseq/fairseq_cli/train.py", line 450, in cli_main File "/data1/cordercorder/fairseq/fairseq_cli/train.py", line 450, in cli_main sys.exit(load_entry_point('fairseq', 'console_scripts', 'fairseq-train')()) File "/data1/cordercorder/fairseq/fairseq_cli/train.py", line 450, in cli_main distributed_utils.call_main(cfg, main)distributed_utils.call_main(cfg, main) File "/data1/cordercorder/fairseq/fairseq/distributed/utils.py", line 349, in call_main File "/data1/cordercorder/fairseq/fairseq/distributed/utils.py", line 349, in call_main distributed_utils.call_main(cfg, main) File "/data1/cordercorder/fairseq/fairseq/distributed/utils.py", line 349, in call_main distributed_main(cfg.distributed_training.device_id, main, cfg, kwargs) distributed_main(cfg.distributed_training.device_id, main, cfg, kwargs) File "/data1/cordercorder/fairseq/fairseq/distributed/utils.py", line 326, in distributed_main File "/data1/cordercorder/fairseq/fairseq/distributed/utils.py", line 326, in distributed_main distributed_main(cfg.distributed_training.device_id, main, cfg, kwargs) File "/data1/cordercorder/fairseq/fairseq/distributed/utils.py", line 326, in distributed_main main(cfg, **kwargs) File "/data1/cordercorder/fairseq/fairseq_cli/train.py", line 143, in main main(cfg, **kwargs) main(cfg, **kwargs)rder/fairseq/fairseq_cli/train.py", line 143, in main File "/data1/cordercorder/fairseq/fairseq_cli/train.py", line 143, in main valid_losses, should_stop = train(cfg, trainer, task, epoch_itr) File "/data/cordercorder/anaconda3/envs/nmt/lib/python3.7/contextlib.py", line 74, in inner valid_losses, should_stop = train(cfg, trainer, task, epoch_itr) File "/data/cordercorder/anaconda3/envs/nmt/lib/python3.7/contextlib.py", line 74, in inner valid_losses, should_stop = train(cfg, trainer, task, epoch_itr) File "/data/cordercorder/anaconda3/envs/nmt/lib/python3.7/contextlib.py", line 74, in inner return func(*args, **kwds) File "/data1/cordercorder/fairseq/fairseq_cli/train.py", line 259, in train Traceback (most recent call last): File "/data/cordercorder/anaconda3/envs/nmt/bin/fairseq-train", line 33, in <module> return func(*args, **kwds) return func(*args, **kwds) File "/data1/cordercorder/fairseq/fairseq_cli/train.py", line 259, in train File "/data1/cordercorder/fairseq/fairseq_cli/train.py", line 259, in train cfg, trainer, task, epoch_itr, valid_subsets, end_of_epoch File "/data1/cordercorder/fairseq/fairseq_cli/train.py", line 345, in validate_and_save cfg, trainer, task, epoch_itr, valid_subsets, end_of_epoch File "/data1/cordercorder/fairseq/fairseq_cli/train.py", line 345, in validate_and_save cfg, trainer, task, epoch_itr, valid_subsets, end_of_epochsys.exit(load_entry_point('fairseq', 'console_scripts', 'fairseq-train')()) File "/data1/cordercorder/fairseq/fairseq_cli/train.py", line 345, in validate_and_save File "/data1/cordercorder/fairseq/fairseq_cli/train.py", line 450, in cli_main valid_losses = validate(cfg, trainer, task, epoch_itr, valid_subsets) File "/data1/cordercorder/fairseq/fairseq_cli/train.py", line 413, in validate valid_losses = validate(cfg, trainer, task, epoch_itr, valid_subsets) File "/data1/cordercorder/fairseq/fairseq_cli/train.py", line 413, in validate valid_losses = validate(cfg, trainer, task, epoch_itr, valid_subsets) File "/data1/cordercorder/fairseq/fairseq_cli/train.py", line 413, in validate trainer.valid_step(sample) File "/data/cordercorder/anaconda3/envs/nmt/lib/python3.7/contextlib.py", line 74, in inner distributed_utils.call_main(cfg, main) File "/data1/cordercorder/fairseq/fairseq/distributed/utils.py", line 349, in call_main trainer.valid_step(sample) File "/data/cordercorder/anaconda3/envs/nmt/lib/python3.7/contextlib.py", line 74, in inner return func(*args, **kwds) File "/data1/cordercorder/fairseq/fairseq/trainer.py", line 834, in valid_step trainer.valid_step(sample) File "/data/cordercorder/anaconda3/envs/nmt/lib/python3.7/contextlib.py", line 74, in inner return func(*args, **kwds) File "/data1/cordercorder/fairseq/fairseq/trainer.py", line 834, in valid_step return func(*args, **kwds)distributed_main(cfg.distributed_training.device_id, main, cfg, kwargs) File "/data1/cordercorder/fairseq/fairseq/trainer.py", line 834, in valid_step File "/data1/cordercorder/fairseq/fairseq/distributed/utils.py", line 326, in distributed_main main(cfg, **kwargs) File "/data1/cordercorder/fairseq/fairseq_cli/train.py", line 143, in main logging_output = self._reduce_and_log_stats(logging_outputs, sample_size) File "/data1/cordercorder/fairseq/fairseq/trainer.py", line 1157, in _reduce_and_log_stats logging_output = self._reduce_and_log_stats(logging_outputs, sample_size) File "/data1/cordercorder/fairseq/fairseq/trainer.py", line 1157, in _reduce_and_log_stats valid_losses, should_stop = train(cfg, trainer, task, epoch_itr) File "/data/cordercorder/anaconda3/envs/nmt/lib/python3.7/contextlib.py", line 74, in inner logging_output = self._reduce_and_log_stats(logging_outputs, sample_size) File "/data1/cordercorder/fairseq/fairseq/trainer.py", line 1157, in _reduce_and_log_stats return func(*args, **kwds) File "/data1/cordercorder/fairseq/fairseq_cli/train.py", line 259, in train cfg, trainer, task, epoch_itr, valid_subsets, end_of_epoch File "/data1/cordercorder/fairseq/fairseq_cli/train.py", line 345, in validate_and_save self.task.reduce_metrics(logging_outputs, self.get_criterion()) File "/data1/cordercorder/fairseq/fairseq/tasks/translation.py", line 410, in reduce_metrics self.task.reduce_metrics(logging_outputs, self.get_criterion())valid_losses = validate(cfg, trainer, task, epoch_itr, valid_subsets) File "/data1/cordercorder/fairseq/fairseq/tasks/translation.py", line 410, in reduce_metrics File "/data1/cordercorder/fairseq/fairseq_cli/train.py", line 413, in validate self.task.reduce_metrics(logging_outputs, self.get_criterion()) File "/data1/cordercorder/fairseq/fairseq/tasks/translation.py", line 410, in reduce_metrics metrics.log_scalar("_bleu_counts", np.array(counts)) File "/data/cordercorder/anaconda3/envs/nmt/lib/python3.7/site-packages/torch/tensor.py", line 480, in __array__ trainer.valid_step(sample) File "/data/cordercorder/anaconda3/envs/nmt/lib/python3.7/contextlib.py", line 74, in inner metrics.log_scalar("_bleu_counts", np.array(counts)) File "/data/cordercorder/anaconda3/envs/nmt/lib/python3.7/site-packages/torch/tensor.py", line 480, in __array__ return func(*args, **kwds)metrics.log_scalar("_bleu_counts", np.array(counts)) File "/data1/cordercorder/fairseq/fairseq/trainer.py", line 834, in valid_step File "/data/cordercorder/anaconda3/envs/nmt/lib/python3.7/site-packages/torch/tensor.py", line 480, in __array__ return self.numpy() TypeError: can't convert cuda:2 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first. return self.numpy() TypeError: can't convert cuda:3 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first. return self.numpy() TypeError: can't convert cuda:1 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first. logging_output = self._reduce_and_log_stats(logging_outputs, sample_size) File "/data1/cordercorder/fairseq/fairseq/trainer.py", line 1157, in _reduce_and_log_stats self.task.reduce_metrics(logging_outputs, self.get_criterion()) File "/data1/cordercorder/fairseq/fairseq/tasks/translation.py", line 410, in reduce_metrics metrics.log_scalar("_bleu_counts", np.array(counts)) File "/data/cordercorder/anaconda3/envs/nmt/lib/python3.7/site-packages/torch/tensor.py", line 480, in __array__ return self.numpy() TypeError: can't convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first. Traceback (most recent call last): File "/data/cordercorder/anaconda3/envs/nmt/lib/python3.7/runpy.py", line 193, in _run_module_as_main "__main__", mod_spec) File "/data/cordercorder/anaconda3/envs/nmt/lib/python3.7/runpy.py", line 85, in _run_code exec(code, run_globals) File "/data/cordercorder/anaconda3/envs/nmt/lib/python3.7/site-packages/torch/distributed/launch.py", line 261, in <module> main() File "/data/cordercorder/anaconda3/envs/nmt/lib/python3.7/site-packages/torch/distributed/launch.py", line 257, in main cmd=cmd) subprocess.CalledProcessError: Command '['/data/cordercorder/anaconda3/envs/nmt/bin/python', '-u', '/data/cordercorder/anaconda3/envs/nmt/bin/fairseq-train', '--local_rank=3', 'tiny_data_bin', '--distributed-world-size', '4', '--arch', 'transformer', '--share-decoder-input-output-embed', '--optimizer', 'adam', '--adam-betas', '(0.9, 0.98)', '--clip-norm', '0.0', '--lr-scheduler', 'inverse_sqrt', '--warmup-init-lr', '1e-07', '--warmup-updates', '3000', '--lr', '0.0005', '--stop-min-lr', '1e-09', '--dropout', '0.25', '--weight-decay', '0.0001', '--criterion', 'label_smoothed_cross_entropy', '--label-smoothing', '0.1', '--max-tokens', '5000', '--batch-size', '64', '--update-freq', '4', '--max-epoch', '30', '--save-dir', 'checkpoint', '--skip-invalid-size-inputs-valid-test', '--eval-bleu', '--eval-bleu-args', '{"beam": 5}', '--eval-bleu-remove-bpe', 'sentencepiece', '--eval-bleu-print-samples', '--eval-tokenized-bleu', '--best-checkpoint-metric', 'bleu', '--maximize-best-checkpoint-metric', '--validate-interval-updates', '1']' returned non-zero exit status 1. ``` The error is cased by the fact that the numpy of version 1.20.1 does't support codes like following: ```python import torch import numpy as np a = torch.tensor(0, device="cuda:0") b = np.array([a]) ``` The above codes will lead to error: "TypeError: can't convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.", but the codes run well if the numpy version is 1.18.1 or 1.17.0 (when the numpy version is below 1.20.0, it is ok, I guess). However, it seems like that the latest version of fairseq need a numpy package of version 1.20.0 or higher (issue #3203 ). ### Reproduce the error Download the source code of fairseq (commit ID: 7061a0f) and run following code: ```bash export CUDA_VISIBLE_DEVICES=0,1,2,3 data_bin_dir=tiny_data_bin python -m torch.distributed.launch --nproc_per_node=4 \ --master_addr="127.0.0.1" \ --master_port=12345 \ $(which fairseq-train) ${data_bin_dir} \ --distributed-world-size 4 \ --arch transformer \ --share-decoder-input-output-embed \ --optimizer adam \ --adam-betas '(0.9, 0.98)' \ --clip-norm 0.0 \ --lr-scheduler inverse_sqrt \ --warmup-init-lr 1e-07 \ --warmup-updates 3000 \ --lr 0.0005 \ --stop-min-lr 1e-09 \ --dropout 0.25 \ --weight-decay 0.0001 \ --criterion label_smoothed_cross_entropy \ --label-smoothing 0.1 \ --max-tokens 5000 \ --batch-size 64 \ --update-freq 4 \ --max-epoch 30 \ --save-dir checkpoint \ --skip-invalid-size-inputs-valid-test \ --eval-bleu \ --eval-bleu-args '{"beam": 5}' \ --eval-bleu-remove-bpe sentencepiece \ --eval-bleu-print-samples \ --eval-tokenized-bleu \ --best-checkpoint-metric bleu \ --maximize-best-checkpoint-metric \ --validate-interval-updates 1 ``` ## PR review Anyone in the community is free to review the PR once the tests have passed. If we didn't discuss your PR in Github issues there's a high chance it will not be merged. ## Did you have fun? Make sure you had fun coding � Pull Request resolved: #3237 Reviewed By: myleott Differential Revision: D26429732 Pulled By: alexeib fbshipit-source-id: bc887ce952d28541cb07dbbdc7e80e99428a6b34

Summary: fixes previous change that changes state/dataset/etc to class variables instead of instance variables Pull Request resolved: fairinternal/fairseq-py#1623 Reviewed By: michaelauli Differential Revision: D26439560 Pulled By: alexeib fbshipit-source-id: ab9e75a425a47ac7ace006419259e254770e560e

…coder (#1559) Summary: Pull Request resolved: fairinternal/fairseq-py#1559 This matches the behavior of RobertaEncoder. Test Plan: Imported from OSS Reviewed By: gwenzek Differential Revision: D25936937 Pulled By: myleott fbshipit-source-id: 795ec8d50298a41d9e9638101436faa01cdf1586

Summary: This is long overdue, but finally deprecating the RobertaEncoder components and just using TransformerEncoder directly. This will make it easier for some upcoming online backtranslation changes, and will eventually make migrating it to dataclasses/Hydra easier too. It also fixes some longstanding inconsistencies in layernorm placement in the model parallel roberta code. Pull Request resolved: fairinternal/fairseq-py#1560 Test Plan: - confirmed that training gives identical losses as before: https://gist.github.com/myleott/9a4d213fb88a02b00094ea074f5a2e2d - confirmed that old roberta models can be loaded and produce identical results - confirmed that old linformer models can be loaded and produce identical results (reran commands from D25938236 (bf54551)) - confirmed that old model parallel models can be loaded and produce identical results: ``` python -m fairseq_cli.validate --path checkpoint.mp1/checkpoint_last.pt --task dummy_masked_lm --criterion masked_lm --max-sentences 8 --dataset-size 100 --model-parallel-size 2 --distributed-world-size 2 before: 2021-01-19 19:04:14 | INFO | valid | | valid on 'valid' subset | loss 14.62 | ppl 25174.3 | wps 0 | wpb 53248 | bsz 104 after: 2021-01-19 19:06:59 | INFO | valid | | valid on 'valid' subset | loss 14.62 | ppl 25174.3 | wps 0 | wpb 53248 | bsz 104 ``` Reviewed By: gwenzek, ngoyal2707 Differential Revision: D25937145 Pulled By: myleott fbshipit-source-id: 1ce0bc93e28e03fb926534ea4134684a49232599

Summary: Pull Request resolved: fairinternal/fairseq-py#1570 Test Plan: Imported from OSS Reviewed By: gwenzek, ngoyal2707 Differential Revision: D25967675 Pulled By: myleott fbshipit-source-id: 7c7f8d25b87ef9b4f0a85331548bb3a2886a1e92

Summary: # Before submitting - [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements) - [ ] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/master/CONTRIBUTING.md)? - [ ] Did you make sure to update the docs? - [ ] Did you write any new necessary tests? ## What does this PR do? Fixes # (issue). ## PR review Anyone in the community is free to review the PR once the tests have passed. If we didn't discuss your PR in Github issues there's a high chance it will not be merged. ## Did you have fun? Make sure you had fun coding � Pull Request resolved: fairinternal/fairseq-py#1629 Reviewed By: myleott Differential Revision: D26484942 Pulled By: sshleifer fbshipit-source-id: 9dcbab5c404c14d8f35628d823102ad9ce59dffd

Summary: Integrating LASER (Language-Agnostic SEntence Representations) training code - [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements) - [ Y] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/master/CONTRIBUTING.md)? - [ N/A] Did you make sure to update the docs? - [ Y] Did you write any new necessary tests? => an additional test in `test_iterators.py` ## What does this PR do? This diff introduces the training code for LASER. It includes a specific `laser` task in `laser_task.py` which reads a json configuration file describing the binarized datasets of language pairs. `multitask_data_utils.py` defines dataset wrappers and iterators used by `laser` task. ## PR review Anyone in the community is free to review the PR once the tests have passed. If we didn't discuss your PR in Github issues there's a high chance it will not be merged. ## Did you have fun? Yes. � Pull Request resolved: fairinternal/fairseq-py#1207 Reviewed By: myleott Differential Revision: D26454296 Pulled By: Celebio fbshipit-source-id: c987672aa66abf31b039ee11867b06912d3486e5

Summary: Add back a couple speed optimizations in the original roberta code that got lost in the refactor Pull Request resolved: fairinternal/fairseq-py#1626 Reviewed By: gwenzek Differential Revision: D26478534 Pulled By: myleott fbshipit-source-id: b945de5e9bffd51cd63630cc3aa1f0078a41cca8

Summary: # Before submitting - [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements) - [x] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/master/CONTRIBUTING.md)? - [x] Did you make sure to update the docs? - [x] Did you write any new necessary tests? ## What does this PR do? - updates audio_utils to handle multi-channel audio as well as mono, with no change needed for existing recipes - adds speech-to-text example for Multilingual TEDx (http://openslr.org/100) data ## PR review Anyone in the community is free to review the PR once the tests have passed. If we didn't discuss your PR in Github issues there's a high chance it will not be merged. ## Did you have fun? Make sure you had fun coding � Pull Request resolved: #3253 Reviewed By: yuntang Differential Revision: D26514419 Pulled By: kahne fbshipit-source-id: 699e428affda5b1347f96a8310691ab152dd6769

Summary: after D26382917 (02803a1) shipped somehow the self._device was removed in optimizer, (or maybe I didn't test it the right way in the previous diff?) fortunately OSS doesn't need it any way. Reviewed By: myleott Differential Revision: D26523538 fbshipit-source-id: 637c1e344670340ae40b32635ef51f5501966b0c

Summary: This is the pull request for the code for the paper [SimulMT to SimulST: Adapting Simultaneous Text Translation to End-to-End Simultaneous Speech Translation](https://www.aclweb.org/anthology/2020.aacl-main.58/) The model will also be used for [IWSLT 2021 shared task on simultaneous translation ](https://iwslt.org/2021/simultaneous) This pull request includes - Convtransformer offline model - Convtransformer simultaneous translation model with fixed pre-decision module - The agent files for inference for the convtransformer simultaneous translation model jmp84 The README is still missing. Just curious where should I place it? Pull Request resolved: fairinternal/fairseq-py#1607 Test Plan: Imported from GitHub, without a `Test Plan:` line. ********** One of the failing landing integration tests ``` buck test mode/dev //multimo/fb/models/test:multimo_fb_model_test https://fburl.com/testinfra/oxq2cn5n ``` Reviewed By: jmp84 Differential Revision: D26439663 Pulled By: sravyapopuri388 fbshipit-source-id: b127cb4962756af221b65e3ccb6598a42fc75f7f

Summary: This diff integrates simul ST training into pyspeech with very minor modifications to the open sourced code. Specific changes made are - In fixed_pre_decision.py remove self as argument to p_choose function as it is already called with super in line 101 - In monotonic_multihead_attention.py remove pdb.set_trace() - Move label_smoothed_cross_entropy_latency_augmented.py to fairseq/criterions folder and add missing arguments to parser - In fairseq/data/data_utils.py type cast max_tokens to int to avoid type error. - Update fairseq/convtransformer.py to pyspeech/convtransformer.py # Next steps: - Verify decoding using the model trained - Support everstore handle based decoding in simuleval and integrate it into pyspeech. Reviewed By: jmp84 Differential Revision: D26478861 fbshipit-source-id: 3b02b2aee757e5464b71dbdd7ebdba42659faee5

Summary: Fix LibriSpeech data prep script * Lowercasing transcript to be consistent with the pre-trained models Reviewed By: jmp84 Differential Revision: D26538845 fbshipit-source-id: 0885f99e2c85f0e722a24f3cb83f2635ce9429bc

Summary: # Before submitting - [x] Was this discussed/approved via a Github issue? (no need for typos, doc improvements) - [x] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/master/CONTRIBUTING.md)? - [x] Did you make sure to update the docs? - [ ] Did you write any new necessary tests? ## What does this PR do? Fixes KeyError mentioned in # (3211). ## PR review Anyone in the community is free to review the PR once the tests have passed. If we didn't discuss your PR in Github issues there's a high chance it will not be merged. ## Did you have fun? Make sure you had fun coding � Pull Request resolved: #3212 Reviewed By: alexeib Differential Revision: D26513255 Pulled By: myleott fbshipit-source-id: 5a11cb369c9d4202fab6998d269e7da5f3d3e534

Summary: # Before submitting - [x] Was this discussed/approved via a Github issue? (no need for typos, doc improvements) - [x] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/master/CONTRIBUTING.md)? - [x] Did you make sure to update the docs? - [ ] Did you write any new necessary tests? ## What does this PR do? Fixes #3178 (issue). ## PR review Anyone in the community is free to review the PR once the tests have passed. If we didn't discuss your PR in Github issues there's a high chance it will not be merged. ## Did you have fun? Make sure you had fun coding � (I did ;) Pull Request resolved: #3249 Reviewed By: alexeib Differential Revision: D26513275 Pulled By: myleott fbshipit-source-id: 2785098a945404c07eb72c079177654b1739a7a2

Summary: I tried resuming a run from a checkpoint in f250883864, but ran into: AssertionError: Criterion does not match; please reset the optimizer (--reset-optimizer). DistributedTimeoutWrapper vs ContrastiveLabelsCriterion Based on this, I believe since D25836853 (d68a353) we are no longer saving the actual criterion's name, but DistributedTimeoutWrapper in the checkpoint. This is kind of weird though, as I would expect more people to run into this issue. Not sure if I am doing something wrong, let me know if so, thanks! Reviewed By: myleott Differential Revision: D26478656 fbshipit-source-id: bc3c7c925f5505140d9df4438af3a73d65d4f531

Summary: Pull Request resolved: fairinternal/fairseq-py#1636 Reviewed By: xutaima Differential Revision: D26562816 Pulled By: jmp84 fbshipit-source-id: 4e6efd0b4236d7187bd365d790f260bd5297aed5

…9) (#3235) Summary: # Before submitting - [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements) - [x] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/master/CONTRIBUTING.md)? - [N/A] Did you make sure to update the docs? - [N/A] Did you write any new necessary tests? ## What does this PR do? Currently when installing the newest source package from PyPI I get an error like so: ``` Collecting fairseq Using cached fairseq-0.10.2.tar.gz (938 kB) Installing build dependencies ... done Getting requirements to build wheel ... error ERROR: Command errored out with exit status 1: command: /home/frankier/sources/datasets/.venv/bin/python3 /tmp/tmp_ujftsgi_in_process.py get_requires_for_build_wheel /tmp/tmpmn0eumq2 cwd: /tmp/pip-install-dg5d6q9y/fairseq Complete output (31 lines): Traceback (most recent call last): File "setup.py", line 214, in <module> do_setup(package_data) File "setup.py", line 136, in do_setup setup( File "/tmp/pip-build-env-hag0sxvp/overlay/lib/python3.9/site-packages/setuptools/__init__.py", line 152, in setup _install_setup_requires(attrs) File "/tmp/pip-build-env-hag0sxvp/overlay/lib/python3.9/site-packages/setuptools/__init__.py", line 147, in _install_setup_requires dist.fetch_build_eggs(dist.setup_requires) File "/tmp/pip-build-env-hag0sxvp/overlay/lib/python3.9/site-packages/setuptools/build_meta.py", line 60, in fetch_build_eggs raise SetupRequirementsError(specifier_list) setuptools.build_meta.SetupRequirementsError: ['cython', 'numpy', 'setuptools>=18.0'] During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/tmp/tmp_ujftsgi_in_process.py", line 280, in <module> main() File "/tmp/tmp_ujftsgi_in_process.py", line 263, in main json_out['return_val'] = hook(**hook_input['kwargs']) File "/tmp/tmp_ujftsgi_in_process.py", line 114, in get_requires_for_build_wheel return hook(config_settings) File "/tmp/pip-build-env-hag0sxvp/overlay/lib/python3.9/site-packages/setuptools/build_meta.py", line 149, in get_requires_for_build_wheel return self._get_build_requires( File "/tmp/pip-build-env-hag0sxvp/overlay/lib/python3.9/site-packages/setuptools/build_meta.py", line 130, in _get_build_requires self.run_setup() File "/tmp/pip-build-env-hag0sxvp/overlay/lib/python3.9/site-packages/setuptools/build_meta.py", line 145, in run_setup exec(compile(code, __file__, 'exec'), locals()) File "setup.py", line 217, in <module> os.unlink(fairseq_examples) IsADirectoryError: [Errno 21] Is a directory: 'fairseq/examples' ---------------------------------------- ERROR: Command errored out with exit status 1: /home/frankier/sources/datasets/.venv/bin/python3 /tmp/tmp_ujftsgi_in_process.py get_requires_for_build_wheel /tmp/tmpmn0eumq2 Check the logs for full command output. ``` I believe the reason for this is that the source package contains the examples directory because it was put there during package creation (it seems the symlink because a directory). Now, when setup.py is run again, it seems the setup.py attempts to unlink the directory, which is not possible because only symlinks can be unlinked. This PR therefore only attempts to unlink it if it is a symlink. I have not thoroughly tested whether my proposed cause is the true cause, but this should fix it in any case. Note that the source package is fetched because there is no wheel for Python 3.9, so most users will not see this because they will use the wheel. ## PR review Anyone in the community is free to review the PR once the tests have passed. If we didn't discuss your PR in Github issues there's a high chance it will not be merged. ## Did you have fun? Make sure you had fun coding � Pull Request resolved: #3235 Reviewed By: alexeib Differential Revision: D26513259 Pulled By: myleott fbshipit-source-id: 775d6c636a5867b9983bb6419829f13ee414e2fd

Summary: # Before submitting - [NO] Was this discussed/approved via a Github issue? (no need for typos, doc improvements) - [YES] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/master/CONTRIBUTING.md)? - [YES] Did you make sure to update the docs? - [NO] Did you write any new necessary tests? ## What does this PR do? This is a typo fix to the Hydra Integration doc where the example with dataclass config should user `FairseqTask` and not `LegacyFairseqTask`. Didn't make an issue for this as it's a trivial doc change for the example to match the actual doc. Pull Request resolved: fairinternal/fairseq-py#1619 Reviewed By: huihuifan Differential Revision: D26448855 Pulled By: Mortimerp9 fbshipit-source-id: 467323101b8425370f6bd7c0532e70abb319b337

Summary: This diff 1. Updates FairseqSimulSTAgent to make it generic and reusable internally [Touches OSS] 2. Adds FBFairseqSimulSTAgent inheriting FairseqSimulSTAgent 3. Add TARGETS file in examples/speech_to_text 4. Update simuleval TARGETS and add a bento kernel for easy testing Reviewed By: jmp84 Differential Revision: D26573214 fbshipit-source-id: f4b71f90693cc878cc771b46a006bcbc83a50124

…3247) Summary: # Before submitting - [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements) - [x] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/master/CONTRIBUTING.md)? - [x] Did you make sure to update the docs? - [x] Did you write any new necessary tests? ## What does this PR do? Fixes #3246 Fixes #3248 ## PR review Anyone in the community is free to review the PR once the tests have passed. If we didn't discuss your PR in Github issues there's a high chance it will not be merged. ## Did you have fun? Make sure you had fun coding � Pull Request resolved: #3247 Reviewed By: myleott Differential Revision: D26513267 Pulled By: lematt1991 fbshipit-source-id: 958de0b3a58a0dd2a56bd6c6d7fb2644a89f6746

Summary: # Before submitting - [N] Was this discussed/approved via a Github issue? (no need for typos, doc improvements) - [Y] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/master/CONTRIBUTING.md)? - [Y] Did you make sure to update the docs? - [N] Did you write any new necessary tests? ## What does this PR do? Small fixes in the script and documentation for correctly reproducing the results in the corresponding paper. ## PR review Anyone in the community is free to review the PR once the tests have passed. If we didn't discuss your PR in Github issues there's a high chance it will not be merged. ## Did you have fun? Make sure you had fun coding � Pull Request resolved: #3264 Reviewed By: lematt1991 Differential Revision: D26587397 Pulled By: myleott fbshipit-source-id: 3675ec4d4388cafa224d395e08b53667f142cb27

Summary: - Use `PathManager.ls` instead of `os.listdir` - Add version.txt to fairseq TARGETS Reviewed By: vishrav Differential Revision: D26579091 fbshipit-source-id: 20d57dc19335a3006cd5fa6d1a3d5e878b105874

Summary: # Before submitting - [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements) - [x] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/master/CONTRIBUTING.md)? - [ ] Did you make sure to update the docs? - [x] Did you write any new necessary tests? ## What does this PR do? fixes circular import as complained by python ## PR review Anyone in the community is free to review the PR once the tests have passed. If we didn't discuss your PR in Github issues there's a high chance it will not be merged. ## Did you have fun? Make sure you had fun coding � Pull Request resolved: #3257 Reviewed By: jmp84 Differential Revision: D26587382 Pulled By: myleott fbshipit-source-id: a8a6e7bee4dcfa6baf934c257958b7d7592205c8

Summary: Batch level sampling (each batch comes from a dataset sampled from some distribution) is useful in cases where we have a criterion that makes this assumption or a unique collator per dataset. However, the current implementation in fairseq `MultiCorpusSampledDataset` is inefficient, because it packs batches by assuming the size of item i is `max(dataset.size(i % len(dataset)) for dataset in datasets)`, which often significantly overestimates the actual sampled item's size, especially with many datasets. We can make this more efficient by modifying `MultiCorpusDataset`, which can do efficient batch sampling by: 1. Every epoch, sampling the indices/dataset to train on. 2. When creating batches, create per-dataset batches and merge them together Reviewed By: jay-mahadeokar Differential Revision: D26601515 fbshipit-source-id: a3273f88d86d7922f9ba004e7324e909ecc6ecf7

Summary: ### Measurements TLDR: This saves ~8% CPU RAM for training tiny model on medium sized dataset (11GB on disk) Command below: ``` +---------------------+----------------+---------+--------+ | fname | cpu_mem_used | wps | ppl | +=====================+================+=========+========+ +---------------------+----------------+---------+--------+ | branch_nw8_2gpu.log | 25.41 | 54721 | 429.1 | +---------------------+----------------+---------+--------+ +---------------------+----------------+---------+--------+ | master_nw8_2gpu.log | 27.53 | 52833.1 | 429.1 | +---------------------+----------------+---------+--------+ ``` ### Command ``` base_cmd () { dd=$1 shift fairseq-train --fp16 $dd \ --task language_modeling \ --arch transformer_lm_gpt2_tiny \ --sample-break-mode complete --tokens-per-sample 512 \ --optimizer adam --clip-norm 0.0 --lr 0.0005 \ --batch-size 1 \ --max-update 200 --max-epoch 1 \ --log-format simple --log-interval 100 \ --restore-file x.pt --no-save \ --skip-invalid-size-inputs-valid-test --disable-validation $@ } CUDA_VISIBLE_DEVICES=0,1 base_cmd /private/home/sshleifer/data-bin/stories_mmap --num-workers 8 ``` Pull Request resolved: fairinternal/fairseq-py#1647 Reviewed By: myleott Differential Revision: D26628861 Pulled By: sshleifer fbshipit-source-id: 142afe0358d1c4cae448828ba811b211406509d7

Summary: Pull Request resolved: fairinternal/fairseq-py#1641 Reviewed By: myleott Differential Revision: D26607648 Pulled By: sshleifer fbshipit-source-id: 9d7f9d7a0825e3124c181b651a126842e5de6109

Summary: Pull Request resolved: fairinternal/fairseq-py#1637 Test Plan: ```bash python examples/bart/summarize.py --model-dir pytorch/fairseq --model-file bart.large.cnn --src $HOME/data-bin/cnn_dm/test.source --n 12 --out hub_hypo.txt python examples/bart/summarize.py \ --model-dir pytorch/fairseq \ --model-file bart.large.cnn \ --src cnn_dm/test.source \ --out cnn_dm/test.hypo --xsum-kwargs ``` Reviewed By: ngoyal2707 Differential Revision: D26581703 Pulled By: sshleifer fbshipit-source-id: 80eb28012f7770eee01ed50a1163c5a2c5cc6d37

Summary: Pull Request resolved: fairinternal/fairseq-py#1649 Reviewed By: stephenroller Differential Revision: D26639303 Pulled By: myleott fbshipit-source-id: 7def925cd7885cfe85d542464316cbc0f2ba6d2c

Summary: Adding fairseq entrypoint section of e2e pipeline so FairseqConfig to hydra_main, runs smoothly Reviewed By: jieru-hu Differential Revision: D29714729 fbshipit-source-id: e3694e0037bb4c4f69208c1d6ec7df91d42fb588

Summary: Implemented fix bit scalar quantization with quant noise for pytext models Reviewed By: AkshatSh Differential Revision: D29662977 fbshipit-source-id: ebab68a4a5ff1583a0c6dfadcf2671663e232c18

Summary: - stores exp_avg and exp_sq_avg in fp16, with `scale` variables to avoid overflow. - myleott added this to gshard, following github.com/openai/jukebox/blob/master/jukebox/utils/fp16.py Pull Request resolved: fairinternal/fairseq-py#2139 Reviewed By: myleott Differential Revision: D30113175 Pulled By: sshleifer fbshipit-source-id: 03995c8eb096629675eadec4e7b8e7f18fc2730e

Summary: ## What does this PR do? Adds GSLM directory with README. Pull Request resolved: fairinternal/fairseq-py#2151 Reviewed By: wnhsu Differential Revision: D30147672 Pulled By: hikushalhere fbshipit-source-id: bcc7cbbde3626ea3d91917707a91aff85d715baa

Summary: adds finetuned robust w2v models and updates readme fixes #3721 Pull Request resolved: fairinternal/fairseq-py#2196 Reviewed By: wnhsu Differential Revision: D30367999 Pulled By: alexeib fbshipit-source-id: 616b373bf31265c89f694fba7dccce2961d394f3

Summary: ## What does this PR do? Fixes OOM which happens from TPUs due to dynamic batching exceed the max a single core can work with. Pull Request resolved: #3781 Reviewed By: wnhsu Differential Revision: D30327091 Pulled By: alexeib fbshipit-source-id: 0ebe6b18329fa05d359083fa8ac54aba7b48bc53

Summary: Fix fairinternal/fairseq-py#2177 for the transformer conversion to Hydra. The way the defaults are dealt with now is different so when you use the legacy Namespace configuration, you end up with a default encoder_embed_dim, which in the VGG case sets up a encoder attention in the TransformerDecoderLayer with the wrong dimentions. The easiest solution is to erase the default value for encoder_embed_dim (by forcing it to None) when converting the VGG config to the raw Namespace for the decoder layer. Tested with: `pytest tests/speech_recognition/test_vggtransformer.py -k Transformer` Pull Request resolved: fairinternal/fairseq-py#2213 Test Plan: pytest tests/speech_recognition/test_vggtransformer.py -k Transformer Reviewed By: sshleifer Differential Revision: D30425143 Pulled By: Mortimerp9 fbshipit-source-id: 92f6dea2ffbb68e441700bcc55274b3167a587b3

Summary: ## What does this PR do? Open sourcing code for Generative Spoken Language Modeling Pull Request resolved: fairinternal/fairseq-py#2201 Reviewed By: wnhsu, eugene-kharitonov Differential Revision: D30563114 Pulled By: hikushalhere fbshipit-source-id: 6c1ee3b29038fd2c9fb5939bddcc70af0794dab4

Summary: Pull Request resolved: fairinternal/fairseq-py#2239 Reviewed By: sshleifer, ngoyal2707 Differential Revision: D30574791 Pulled By: myleott fbshipit-source-id: 0f83e6ffe53d608292545884df269a604a57448d

Summary: 1. added test for genereting pad tokens during beam search with prefix tokens 2. modified lprobs for pad token and prefix tokens to avoid generating pad # Before submitting - [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements) - [ ] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/master/CONTRIBUTING.md)? - [ ] Did you make sure to update the docs? - [ ] Did you write any new necessary tests? ## What does this PR do? Fixes # (issue). ## PR review Anyone in the community is free to review the PR once the tests have passed. If we didn't discuss your PR in Github issues there's a high chance it will not be merged. ## Did you have fun? Make sure you had fun coding � Pull Request resolved: fairinternal/fairseq-py#2227 Reviewed By: xianxl Differential Revision: D30649356 Pulled By: jingfeidu fbshipit-source-id: d94903a912e767391c8fca61f98f65b5cea3b56e

Summary: Pull Request resolved: fairinternal/fairseq-py#2236 The test_eval_bleu unittest in TestTranslation in tests/test_binaries.py failed after the scarebleu version is updated to 2.0.0 in OSS testing tool. Added the fix so that the test can pass when scarebleu version is both 1.x and 2.0.0. Reviewed By: myleott, sravyapopuri388 Differential Revision: D30525920 fbshipit-source-id: 8ef27509cec45422a8d22003c87c2a7acb55225d

Summary: ## What does this PR do? Currently, binarized dataset are stored as a bin representation of int tensors. At best, each int is coded as uint16 on disk. When coding a fixed size vocabulary dataset where we know the frequency of each symbol and where some symbols are more common than other, we can do better. This happens in particular when binarizing a dataset split in subword units as the most common "tokenizers" like bpe and spm will choose subwords with high frequencies over subwords with low frequencies. In practice, if we know the frequency of all symbols (or a good estimate), we can use entropy encoding methods to compress the data. The idea is to assign a compressed representation where frequent symbols will have shorter representations than unfrequent symbols. In this PR, we build a Huffman code from a frequency table and use this code to encode a dataset. The PR provides the huffman coder implementation (using the single queue approach as we usually start with a sorted set of symbols) as well as a memory map implementation of a dataset that stores the data compressed with a huffman code and can return indexed tensors from it. Over a whole dataset, depending on how many symbols we sample to evaluate the frequency, we can save between 25% and 30% of storage space. ## Follow Ups currently the binarizer/preprocess script make too many assumptions about the dataset writers so the huffman dataset writer cannot be used straight out of the box with it. I will make follow ups PRs to provide easy to use scripts to build such datasets. But it's as simple as doing: ``` code_builder = HuffmanCodeBuilder() with open(sample_file, 'r', encoding="utf-8") as input: for line in input: code_builder.add(*line.strip().split(" ")) coder = code_builder.build_code() with HuffmanMMapIndexedDatasetBuilder('/tmp/testing_huffman', coder) as builder: with open(dataset_file, 'r', encoding="utf-8") as input: for line in input: builder.add_item(line.strip().split(' ')) ``` a lot of the `HuffmanMMapIndexedDataset` code comes from the normal `MMapIndexedDataset` and we could probably extract commonalities in a base class the `HuffmanCoder` is also really a special kind of `Dictionary` and again, a common base class could be abstracted out of them. Pull Request resolved: fairinternal/fairseq-py#2029 Reviewed By: dianaml0 Differential Revision: D29557468 Pulled By: Mortimerp9 fbshipit-source-id: a01b6d98f38f937934cadebb3786133e257adefe

Summary: Adds Exponential moving average (EMA) model for Kaizen semi-supervised training https://arxiv.org/abs/2106.07759 1. Add `ema.store_ema` to enable storing EMA. EMA will be written to extra_state in the state dict while saving checkpoint. 2. `ema.ema_start_update` to control when the EMA starts accumulating 3. Tasks can use `uses_ema` property to decide if the EMA should be passed to the task. (Default is False) 4. `load_ema_from_checkpoint` can be used to load EMA model in place of the model to be used for evalutation. Pyspeech has eval-ema option for this. ``` This module has the EMA class used to store a copy of the exponentially decayed model params. Typical usage of EMA class involves initializing an object using an existing model (random or from a seed model) and setting the config like ema_decay, ema_start_update which determine how the EMA model is updated. After every update of the model i.e. at the end of the train_step, the EMA should be updated by passing the new model to the EMA.step function. The EMA model state dict can be stored in the extra state under the key of "ema" and dumped into a checkpoint and loaded. The EMA object can be passed to tasks by setting task.uses_ema property. EMA is a smoothed/ensemble model which might have better performance when used for inference or further fine-tuning. EMA class has a reverse function to load the EMA params into a model and use it like a regular model. ``` Reviewed By: cruvadom Differential Revision: D24238379 fbshipit-source-id: 879d3ba5070a614b7d365f9503af357001e875b2

…ributional Hypothesis" (#1930) Summary: Paper submitted to EMNLP: https://arxiv.org/abs/2104.06644 Pull Request resolved: fairinternal/fairseq-py#1930 Reviewed By: lematt1991 Differential Revision: D28885634 Pulled By: shruti-bh fbshipit-source-id: d433c87cff3603b3e676a129029a827c510a72c7

Summary: # Before submitting the default score was set as min score of all lprobs, which would let us select tokens other than prefix tokens during beam search. having a pretty hacky way to make it smaller than any lprobs. - [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements) - [ ] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/master/CONTRIBUTING.md)? - [ ] Did you make sure to update the docs? - [ ] Did you write any new necessary tests? ## What does this PR do? Fixes # (issue). ## PR review Anyone in the community is free to review the PR once the tests have passed. If we didn't discuss your PR in Github issues there's a high chance it will not be merged. ## Did you have fun? Make sure you had fun coding � Pull Request resolved: fairinternal/fairseq-py#2267 Reviewed By: myleott Differential Revision: D30730475 Pulled By: jingfeidu fbshipit-source-id: 7dab4e9ed2fc094910467bad776155230987e21a

Summary: As title Reviewed By: zhengwy888, xiaoxiao26 Differential Revision: D30621478 fbshipit-source-id: d79aba3f98d39a5c46a53bf206522c5f7d05e02a

Summary: ## TL;DR Fairseq checkpoint saving and loading should mirror torch's checkpoint by saving and loading "state_dict()._metadata". ## Long Story: #### What happened: During model loading and saving, Quantization-aware-training models in Pytorch encounters a weird bug that says state_dict "fake_weight_quant.weight.min_val" is mismatched to "min_vals". #### What was the reason: - We found the issue in that torch uses state_dict()._metadata to store module._version, but the metadata was never store in checkpoint, nor are they loaded during checkpoint loading in fairseq. Reviewed By: frankseide Differential Revision: D30649933 fbshipit-source-id: ce262486b9b95fbcece463fa05c4e1903d4232d7

Summary: 1) add annotation for encoder_out 2) force dropout to be float for jitable purpose. Reviewed By: cndn Differential Revision: D30826657 fbshipit-source-id: aca79845d7ae48d450b602a7be8f56404f4c7bab

Summary: [fairseq-py] add speech synthesis preprocessing and evaluation scripts Reviewed By: wnhsu Differential Revision: D30720282 fbshipit-source-id: 6e4b098b6f56fff41b82af4347518d7f7905c801

Summary: [fairseq-py] update S2T Reviewed By: wnhsu Differential Revision: D30720434 fbshipit-source-id: dc4e46b0cc3dec24943baeabe59424dabd5be38f

…ict" (#3861) Summary: Pull Request resolved: #3861 backout fairseq changes. fix with a suggested, more optimal changes in checkopint utils. Reviewed By: zhengwy888 Differential Revision: D30886481 fbshipit-source-id: 12b6dd4d5107ab4371b73a58d9a044a17c733260

Summary: Pull Request resolved: #3862 We resolved a bug for missing "_metadata" attribute for pytorch models during checkpoing saving and loading using forced state["model"]["_metadata"], but it's not an efficient solution due to expensive model.state_dict() invocation. This diff offers an alternative solution. Reviewed By: zhengwy888 Differential Revision: D30857147 fbshipit-source-id: 5daa978e2a558ad4159e2da55470253950151bde

Summary: [fairseq-py] add TTS Reviewed By: wnhsu Differential Revision: D30720666 fbshipit-source-id: b5288acec72bea1d3a9f3884a4ed51b616c7a403

Summary: Aligned training was not using batch_by_size in the dataset. Due to this, it was not possible to use batch sampling in MultiCorpusDataset with different transforms and collators for different datasets. Reviewed By: xiaoxiao26 Differential Revision: D30889985 fbshipit-source-id: 224ad55d2337681a06a82caf19900e5a241a3d6a

Summary: Fixing issues ([3546](#3546)) with latency augmented training for mma due to the change of fairseq APIs Pull Request resolved: fairinternal/fairseq-py#2087 Reviewed By: hygong-fb Differential Revision: D29851286 Pulled By: xutaima fbshipit-source-id: 6c3077db06b89c23b312b28527d7395a725f3b3a

Summary: # Before submitting - [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements) - [ ] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/master/CONTRIBUTING.md)? - [ ] Did you make sure to update the docs? - [ ] Did you write any new necessary tests? ## What does this PR do? Fixes # (issue). ## PR review Anyone in the community is free to review the PR once the tests have passed. If we didn't discuss your PR in Github issues there's a high chance it will not be merged. ## Did you have fun? Make sure you had fun coding � Pull Request resolved: #3879 Reviewed By: myleott Differential Revision: D30969142 Pulled By: dianaml0 fbshipit-source-id: 902154c03fd68ae6645d3e0ac07b7d729dfc7934

…change (#2297) Summary: # Before submitting - [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements) - [ ] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/master/CONTRIBUTING.md)? - [ ] Did you make sure to update the docs? - [ ] Did you write any new necessary tests? ## What does this PR do? Fixes # (issue). ## PR review Anyone in the community is free to review the PR once the tests have passed. If we didn't discuss your PR in Github issues there's a high chance it will not be merged. ## Did you have fun? Make sure you had fun coding � Pull Request resolved: fairinternal/fairseq-py#2297 Reviewed By: alexeib Differential Revision: D30906090 Pulled By: dianaml0 fbshipit-source-id: 941d30db7f766c9077a1b5bb2a04680f57e2e070

#3773) Summary: …verride the defaults # Before submitting - [x] Was this discussed/approved via a Github issue? (no need for typos, doc improvements) - [x] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/master/CONTRIBUTING.md)? - [ ] Did you make sure to update the docs? - [ ] Did you write any new necessary tests? ## What does this PR do? Fixes #3761. ## PR review Anyone in the community is free to review the PR once the tests have passed. If we didn't discuss your PR in Github issues there's a high chance it will not be merged. ## Did you have fun? Make sure you had fun coding � Pull Request resolved: #3773 Reviewed By: yuntang Differential Revision: D30310383 Pulled By: kahne fbshipit-source-id: cbfcbc032dbf53490a25ffdebe57f65c42d52e71

Summary: Update reference from master to main elsewhere in fbcode Reviewed By: alexeib Differential Revision: D30938472 fbshipit-source-id: 243b98550207f241c9d3265bf3d4060350aaf0a8

Weiyi Zheng and others added 30 commits February 12, 2021 14:06

Make validate.py work with model parallel (#1570)

7096ac3

Summary: Pull Request resolved: fairinternal/fairseq-py#1570 Test Plan: Imported from OSS Reviewed By: gwenzek, ngoyal2707 Differential Revision: D25967675 Pulled By: myleott fbshipit-source-id: 7c7f8d25b87ef9b4f0a85331548bb3a2886a1e92

Fix LibriSpeech data prep script

675f608

Summary: Fix LibriSpeech data prep script * Lowercasing transcript to be consistent with the pre-trained models Reviewed By: jmp84 Differential Revision: D26538845 fbshipit-source-id: 0885f99e2c85f0e722a24f3cb83f2635ce9429bc

Correct the estimation of cnn output lengths in convtransformer (#1636)

ae22da6

Summary: Pull Request resolved: fairinternal/fairseq-py#1636 Reviewed By: xutaima Differential Revision: D26562816 Pulled By: jmp84 fbshipit-source-id: 4e6efd0b4236d7187bd365d790f260bd5297aed5

Small fixes for flow-cli usage

b9778da

Summary: - Use `PathManager.ls` instead of `os.listdir` - Add version.txt to fairseq TARGETS Reviewed By: vishrav Differential Revision: D26579091 fbshipit-source-id: 20d57dc19335a3006cd5fa6d1a3d5e878b105874

make LanguageModelingTask 1% simpler (#1641)

5c008e0

Summary: Pull Request resolved: fairinternal/fairseq-py#1641 Reviewed By: myleott Differential Revision: D26607648 Pulled By: sshleifer fbshipit-source-id: 9d7f9d7a0825e3124c181b651a126842e5de6109

Set DynamicLossScaler class defaults to match CLI defaults (#1649)

fb3fadb

Summary: Pull Request resolved: fairinternal/fairseq-py#1649 Reviewed By: stephenroller Differential Revision: D26639303 Pulled By: myleott fbshipit-source-id: 7def925cd7885cfe85d542464316cbc0f2ba6d2c

EdanSneh and others added 29 commits August 2, 2021 18:40

Adding Hydra based trainer target to fairseq in fbcode

fe15926

Summary: Adding fairseq entrypoint section of e2e pipeline so FairseqConfig to hydra_main, runs smoothly Reviewed By: jieru-hu Differential Revision: D29714729 fbshipit-source-id: e3694e0037bb4c4f69208c1d6ec7df91d42fb588

Quant Noise

3d90df4

Summary: Implemented fix bit scalar quantization with quant noise for pytext models Reviewed By: AkshatSh Differential Revision: D29662977 fbshipit-source-id: ebab68a4a5ff1583a0c6dfadcf2671663e232c18

Commit README for GSLM (#2151)

741fd13

Summary: ## What does this PR do? Adds GSLM directory with README. Pull Request resolved: fairinternal/fairseq-py#2151 Reviewed By: wnhsu Differential Revision: D30147672 Pulled By: hikushalhere fbshipit-source-id: bcc7cbbde3626ea3d91917707a91aff85d715baa

Require fairscale >= 0.4.0 to combine FSDP and --update-freq (#2239)

db0175a

Summary: Pull Request resolved: fairinternal/fairseq-py#2239 Reviewed By: sshleifer, ngoyal2707 Differential Revision: D30574791 Pulled By: myleott fbshipit-source-id: 0f83e6ffe53d608292545884df269a604a57448d

try using gradient_as_bucket_view in DDP

9549e7f

Summary: As title Reviewed By: zhengwy888, xiaoxiao26 Differential Revision: D30621478 fbshipit-source-id: d79aba3f98d39a5c46a53bf206522c5f7d05e02a

annotation added for jitable

e3fafbd

Summary: 1) add annotation for encoder_out 2) force dropout to be float for jitable purpose. Reviewed By: cndn Differential Revision: D30826657 fbshipit-source-id: aca79845d7ae48d450b602a7be8f56404f4c7bab

add speech synthesis preprocessing and evaluation scripts

dbe1f82

Summary: [fairseq-py] add speech synthesis preprocessing and evaluation scripts Reviewed By: wnhsu Differential Revision: D30720282 fbshipit-source-id: 6e4b098b6f56fff41b82af4347518d7f7905c801

update S2T

d974c70

Summary: [fairseq-py] update S2T Reviewed By: wnhsu Differential Revision: D30720434 fbshipit-source-id: dc4e46b0cc3dec24943baeabe59424dabd5be38f

add TTS

0ac3f32

Summary: [fairseq-py] add TTS Reviewed By: wnhsu Differential Revision: D30720666 fbshipit-source-id: b5288acec72bea1d3a9f3884a4ed51b616c7a403

Update reference from master to main elsewhere in fbcode

fcca322

Summary: Update reference from master to main elsewhere in fbcode Reviewed By: alexeib Differential Revision: D30938472 fbshipit-source-id: 243b98550207f241c9d3265bf3d4060350aaf0a8

facebook-github-bot deleted the master branch September 20, 2021 21:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Merge pytorch/fairseq#4

Merge pytorch/fairseq#4
imonlius wants to merge 707 commits intoimonlius:wipfrom
facebookresearch:master

imonlius commented Dec 11, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

19 participants

Conversation

imonlius commented Dec 11, 2020

Before submitting

What does this PR do?

PR review

Did you have fun?

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

19 participants