Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Wav2vec 2.0 fine-tuning failure

See original GitHub issue

🐛 Bug

I’m trying to fine-tune the pre-trained wav2vec 2.0 model using fairseq-hydra-train (according to the guide and getting a RuntimeError which says that DistributedDataParallel wasn’t properly initialized.

To Reproduce

Run cmd

fairseq-hydra-train \
    distributed_training.distributed_port=12345 \
    task.data=/home/ppavlov/wav2vec \
    model.w2v_path=/home/ppavlov/wav2vec/wav2vec_small.pt \
    distributed_training.distributed_world_size=2 \
    +optimization.update_freq='[12]' \
    --config-dir fairseq/examples/wav2vec/config/finetuning \
    --config-name base_1h

See error

Traceback (most recent call last):
  File "/home/ppavlov/wav2vec/fairseq/fairseq_cli/hydra_train.py", line 45, in hydra_main
    distributed_utils.call_main(cfg, pre_main)
  File "/home/ppavlov/wav2vec/fairseq/fairseq/distributed/utils.py", line 369, in call_main
    main(cfg, **kwargs)
  File "/home/ppavlov/wav2vec/fairseq/fairseq_cli/train.py", line 128, in main
    trainer = Trainer(cfg, task, model, criterion, quantizer)
  File "/home/ppavlov/wav2vec/fairseq/fairseq/trainer.py", line 144, in __init__
    if self.data_parallel_rank == 0:
  File "/home/ppavlov/wav2vec/fairseq/fairseq/trainer.py", line 177, in data_parallel_rank
    return distributed_utils.get_data_parallel_rank()
  File "/home/ppavlov/wav2vec/fairseq/fairseq/distributed/utils.py", line 463, in get_data_parallel_rank
    return get_rank(get_data_parallel_group())
  File "/home/ppavlov/wav2vec/fairseq/fairseq/distributed/utils.py", line 405, in get_rank
    return dist.get_rank(group=group)
  File "/home/ppavlov/wav2vec/venv/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 688, in get_rank
    default_pg = _get_default_group()
  File "/home/ppavlov/wav2vec/venv/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 347, in _get_default_group
    raise RuntimeError("Default process group has not been initialized, "
RuntimeError: Default process group has not been initialized, please make sure to call init_process_group.

Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.

Environment

fairseq Version: master
PyTorch Version: 1.8.1
OS: Ubuntu 18.04
How you installed fairseq: pip install --editable .
Python version: 3.8
CUDA/cuDNN version: 11.2
GPU models and configuration: 2xTesla T4

Working directory content

data/
dev_other.ltr
dev_other.tsv
dev_other.wrd
dict.ltr.txt
fairseq/
outputs/
train.ltr
train.tsv
train.wrd
venv/
wav2vec_small.pt

Issue Analytics

State:
Created 2 years ago
Comments:5

Top GitHub Comments

4reactions

petrpavlovcommented, Apr 6, 2021

Sorry, I’ve found the solution: distributed_training.distributed_port in my case (single node) is redundant.

2reactions

petrpavlovcommented, Jun 24, 2021

Yes, it would be really great if the documentation was more comprehensive.

Top Results From Across the Web

Wav2vec 2.0 fine-tuning failure · Issue #3451 - GitHub

Bug I'm trying to fine-tune the pre-trained wav2vec 2.0 model using fairseq-hydra-train (according to the guide and getting a RuntimeError ...

Fine-Tune Wav2Vec2 for English ASR with Transformers

Wav2Vec2 is fine-tuned using Connectionist Temporal Classification (CTC), which is an algorithm that is used to train neural networks for ...

Fine-tuning Wav2Vec2 with an LM head | TensorFlow Hub

In this notebook, we will load the pre-trained wav2vec2 model from TFHub and will fine-tune it on LibriSpeech dataset by appending Language Modeling...

analyzing domain shift in self-supervised pre-training - arXiv

Finally, we pre-train a single large wav2vec 2.0 model with 300M parameters [6] on three domains (LL, SF and CV) for 800K steps...

Comparing CTC and LFMMI for out-of-domain adaptation of ...

Fine-tuning the wav2vec 2.0 model with E2E-LFMMI and CTC we obtain the ... Table 1: Comparison of word error rates (WER) (in %)...