question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. ItĀ collects links to all the places you might be looking at while hunting down a tough bug.

And, if youā€™re still stuck at the end, weā€™re happy to hop on a call to see how we can help out.

Wav2vec 2.0 fine-tuning failure

See original GitHub issue

šŸ› Bug

Iā€™m trying to fine-tune the pre-trained wav2vec 2.0 model using fairseq-hydra-train (according to the guide and getting a RuntimeError which says that DistributedDataParallel wasnā€™t properly initialized.

To Reproduce

  1. Run cmd
fairseq-hydra-train \
    distributed_training.distributed_port=12345 \
    task.data=/home/ppavlov/wav2vec \
    model.w2v_path=/home/ppavlov/wav2vec/wav2vec_small.pt \
    distributed_training.distributed_world_size=2 \
    +optimization.update_freq='[12]' \
    --config-dir fairseq/examples/wav2vec/config/finetuning \
    --config-name base_1h
  1. See error
Traceback (most recent call last):
  File "/home/ppavlov/wav2vec/fairseq/fairseq_cli/hydra_train.py", line 45, in hydra_main
    distributed_utils.call_main(cfg, pre_main)
  File "/home/ppavlov/wav2vec/fairseq/fairseq/distributed/utils.py", line 369, in call_main
    main(cfg, **kwargs)
  File "/home/ppavlov/wav2vec/fairseq/fairseq_cli/train.py", line 128, in main
    trainer = Trainer(cfg, task, model, criterion, quantizer)
  File "/home/ppavlov/wav2vec/fairseq/fairseq/trainer.py", line 144, in __init__
    if self.data_parallel_rank == 0:
  File "/home/ppavlov/wav2vec/fairseq/fairseq/trainer.py", line 177, in data_parallel_rank
    return distributed_utils.get_data_parallel_rank()
  File "/home/ppavlov/wav2vec/fairseq/fairseq/distributed/utils.py", line 463, in get_data_parallel_rank
    return get_rank(get_data_parallel_group())
  File "/home/ppavlov/wav2vec/fairseq/fairseq/distributed/utils.py", line 405, in get_rank
    return dist.get_rank(group=group)
  File "/home/ppavlov/wav2vec/venv/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 688, in get_rank
    default_pg = _get_default_group()
  File "/home/ppavlov/wav2vec/venv/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 347, in _get_default_group
    raise RuntimeError("Default process group has not been initialized, "
RuntimeError: Default process group has not been initialized, please make sure to call init_process_group.

Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.

Environment

  • fairseq Version: master
  • PyTorch Version: 1.8.1
  • OS: Ubuntu 18.04
  • How you installed fairseq: pip install --editable .
  • Python version: 3.8
  • CUDA/cuDNN version: 11.2
  • GPU models and configuration: 2xTesla T4

Working directory content

  • data/
  • dev_other.ltr
  • dev_other.tsv
  • dev_other.wrd
  • dict.ltr.txt
  • fairseq/
  • outputs/
  • train.ltr
  • train.tsv
  • train.wrd
  • venv/
  • wav2vec_small.pt

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:5

github_iconTop GitHub Comments

4reactions
petrpavlovcommented, Apr 6, 2021

Sorry, Iā€™ve found the solution: distributed_training.distributed_port in my case (single node) is redundant.

2reactions
petrpavlovcommented, Jun 24, 2021

Yes, it would be really great if the documentation was more comprehensive.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Wav2vec 2.0 fine-tuning failure Ā· Issue #3451 - GitHub
Bug I'm trying to fine-tune the pre-trained wav2vec 2.0 model using fairseq-hydra-train (according to the guide and getting a RuntimeErrorĀ ...
Read more >
Fine-Tune Wav2Vec2 for English ASR with Transformers
Wav2Vec2 is fine-tuned using Connectionist Temporal Classification (CTC), which is an algorithm that is used to train neural networks forĀ ...
Read more >
Fine-tuning Wav2Vec2 with an LM head | TensorFlow Hub
In this notebook, we will load the pre-trained wav2vec2 model from TFHub and will fine-tune it on LibriSpeech dataset by appending Language Modeling...
Read more >
analyzing domain shift in self-supervised pre-training - arXiv
Finally, we pre-train a single large wav2vec 2.0 model with 300M parameters [6] on three domains (LL, SF and CV) for 800K steps...
Read more >
Comparing CTC and LFMMI for out-of-domain adaptation of ...
Fine-tuning the wav2vec 2.0 model with E2E-LFMMI and CTC we obtain the ... Table 1: Comparison of word error rates (WER) (in %)...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found