Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Errors while replicating Example 3: ST on CoVoST

See original GitHub issue

🐛 Bug

Hi, thank you for providing such a nice toolkit. I’m currently working on a research about speech translation, and tried to replicate your experiment. I followed the instructions here, but could not reproduce your results.

There are mainly 4 problems:

Bug 1: Some python packages are missing
Bug 2: Some CoVoST2 audio files are empty
Bug 3: Path in config file is incorrect
Bug 4: Tuple indices must be integers

Detailed error messages are at the bottom of this report.

To Reproduce

Steps to reproduce the behavior: (command line options are same as provided ones, except that I didn’t set --num-workers and --update-freq when executing fairseq-train)

Install fairseq following instruction on Github.
Run examples/speech_to_text/prep_covost_data.py.
You will see Bug 1.
Install pandas, torchaudio, and sentencepiece using pip.
Run examples/speech_to_text/prep_covost_data.py again.
You will see Bug 2 in some languages (at least I encountered this problem for En, De and Es).
Find and remove empty audio files from <covost root>/<lang>/raw/clips; also remove corresponding tsv rows from tsv files.
Run examples/speech_to_text/prep_covost_data.py again. It will finish successfully.
Run fairseq-train command:

fairseq-train ${COVOST_ROOT} --train-subset train_asr_<lang> ...

You will see Bug 3.
Copy or rename config file:

cp config_asr_<lang>.yaml config.yaml

Change audio_root in config.yaml line 1:

change audio_root: <path to covost root>/<lang> to audio_root: <path to covost root>

Run fairseq-train command:

fairseq-train ${COVOST_ROOT}/<lang> --train-subset train_asr_<lang> ...

You will see Bug 4.
Change fairseq/fairseq/models/transformer.py line 809 - 817 to:

    encoder_out[0]
    if (encoder_out is not None and len(encoder_out[0]) > 0)
    else None,
    encoder_out[1]
    if (
        encoder_out is not None
    )
    else None,

Run fairseq-train command again, training will start successfully.

Expected behavior

All commands finishes without any error.

Environment

fairseq version: Installed via following commands on Nov. 27, 2020 (commit dea66cc)

git clone https://github.com/pytorch/fairseq
cd fairseq
pip install --editable ./

PyTorch version: 1.7.0
OS: Ubuntu 20.04
Python version: 3.8.5
CUDA/cuDNN version: 10.2/8.0.3
GPU models and configuration: GeForce GTX 1080 Ti x 8

Error messages

Error messages for Bug 1

<div>

Traceback (most recent call last):
    File "fairseq/examples/speech_to_text/prep_covost_data.py", line 16, in <module>
      import pandas as pd
ModuleNotFoundError: No module named 'pandas'

Traceback (most recent call last):
    File "fairseq/examples/speech_to_text/prep_covost_data.py", line 17, in <module>
    import torchaudio
ModuleNotFoundError: No module named 'torchaudio'

...
Traceback (most recent call last):
    File "fairseq/examples/speech_to_text/prep_covost_data.py", line 18, in <module>
    from examples.speech_to_text.data_utils import (
    File "<path to fairseq>/fairseq/examples/speech_to_text/data_utils.py", line 18, in <module>
      import sentencepiece as sp
ModuleNotFoundError: No module named 'sentencepiece'

</div>

Error messages for Bug 2

<div>

Fetching split train...
100%|██████████████████████████████████| 5.80G/5.80G [10:37<00:00, 9.76MB/s]


100%|██████████████████████████████████| 3.05M/3.05M [00:00<00:00, 3.66MB/s]
Extracting log mel filter bank features...
  0%|                                  | 0/79015 [00:00<?, ?it/s]<path to venv>/venv/lib/python3.8/site-packages/torchaudio/compliance/kaldi.py:574: UserWarning: The function torch.rfft is deprecated and will be removed in a future PyTorch release. Use the new torch.fft module functions, instead, by importing torch.fft and calling torch.fft.fft or torch.fft.rfft. (Triggered internally at  /pytorch/aten/src/ATen/native/SpectralOps.cpp:590.)
    fft = torch.rfft(strided_input, 1, normalized=False, onesided=True)
 37%|████████████                      | 29137/79015 [1:21:38<1:52:09,  7.41it/s]
 formats: can't open input file `<path to covost root>/es/raw/clips/common_voice_es_19499893.mp3':
 37%|███████████                       | 29138/79015 [1:21:38<2:19:45,  5.95it/s]
  Traceback (most recent call last):
    File "fairseq/examples/speech_to_text/prep_covost_data.py", line 294, in <module>
    main()
    File "fairseq/examples/speech_to_text/prep_covost_data.py", line 290, in main
      process(args)
    File "fairseq/examples/speech_to_text/prep_covost_data.py", line 222, in process
    for waveform, sample_rate, _, _, _, utt_id in tqdm(dataset):
    File "<path to venv>/venv/lib/python3.8/site-packages/tqdm/std.py", line 1193, in __iter__
      for obj in iterable:
    File "fairseq/examples/speech_to_text/prep_covost_data.py", line 201, in __getitem__
    waveform, sample_rate = torchaudio.load(path)
    File "<path to venv>/venv/lib/python3.8/site-packages/torchaudio/backend/sox_backend.py", line 48, in load
      sample_rate = _torchaudio.read_audio_file(
  RuntimeError: Error opening audio file

</div>

Error messages for Bug 3

<div>

  2020-11-30 14:36:44 | INFO | fairseq.data.audio.speech_to_text_dataset | Cannot find <path to covost root>/config.yaml
  Traceback (most recent call last):
    File "<path to venv>/venv/bin/fairseq-train", line 33, in <module>
    sys.exit(load_entry_point('fairseq', 'console_scripts', 'fairseq-train')())
    File "<path to fairseq>/fairseq/fairseq_cli/train.py", line 392, in cli_main
      distributed_utils.call_main(cfg, main)
    File "<path to fairseq>/fairseq/fairseq/distributed_utils.py", line 313, in call_main
    torch.multiprocessing.spawn(
    File "<path to venv>/venv/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 199, in spawn
      return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
    File "<path to venv>/venv/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 157, in start_processes
    while not context.join():
    File "<path to venv>/venv/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 118, in join
      raise Exception(msg)
  Exception:

  -- Process 3 terminated with the following error:
  Traceback (most recent call last):
    File "<path to venv>/venv/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 19, in _wrap
    fn(i, *args)
    File "<path to fairseq>/fairseq/fairseq/distributed_utils.py", line 300, in distributed_main
      main(cfg, **kwargs)
    File "<path to fairseq>/fairseq/fairseq_cli/train.py", line 66, in main
    task = tasks.setup_task(cfg.task)
    File "<path to fairseq>/fairseq/fairseq/tasks/__init__.py", line 44, in setup_task
      return task.setup_task(cfg, **kwargs)
    File "<path to fairseq>/fairseq/fairseq/tasks/speech_to_text.py", line 58, in setup_task
    raise FileNotFoundError(f"Dict not found: {dict_path}")
FileNotFoundError: Dict not found: <path to covost root>/dict.txt

</div>

Error messages for Bug 4

<div>

...
  warnings.warn(
  epoch 001:   0%|                                                                         | 0/2 [00:00<?, ?it/s]2020-11-30 14:40:54 | WARNING | fairseq.logging.progress_bar | tensorboard not found, please install with: pip install tensorboardX
  2020-11-30 14:40:54 | INFO | fairseq.trainer | begin training epoch 1
  Traceback (most recent call last):
    File "<path to venv>/venv/bin/fairseq-train", line 33, in <module>
    sys.exit(load_entry_point('fairseq', 'console_scripts', 'fairseq-train')())
    File "<path to fairseq>/fairseq/fairseq_cli/train.py", line 392, in cli_main
      distributed_utils.call_main(cfg, main)
    File "<path to fairseq>/fairseq/fairseq/distributed_utils.py", line 313, in call_main
    torch.multiprocessing.spawn(
    File "<path to venv>/venv/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 199, in spawn
      return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
    File "<path to venv>/venv/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 157, in start_processes
    while not context.join():
    File "<path to venv>/venv/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 118, in join
      raise Exception(msg)
  Exception:

  -- Process 7 terminated with the following error:
  Traceback (most recent call last):
    File "<path to venv>/venv/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 19, in _wrap
    fn(i, *args)
    File "<path to fairseq>/fairseq/fairseq/distributed_utils.py", line 300, in distributed_main
      main(cfg, **kwargs)
    File "<path to fairseq>/fairseq/fairseq_cli/train.py", line 130, in main
    valid_losses, should_stop = train(cfg, trainer, task, epoch_itr)
    File "/usr/lib/python3.8/contextlib.py", line 75, in inner
      return func(*args, **kwds)
    File "<path to fairseq>/fairseq/fairseq_cli/train.py", line 219, in train
    log_output = trainer.train_step(samples)
    File "/usr/lib/python3.8/contextlib.py", line 75, in inner
      return func(*args, **kwds)
    File "<path to fairseq>/fairseq/fairseq/trainer.py", line 540, in train_step
    loss, sample_size_i, logging_output = self.task.train_step(
    File "<path to fairseq>/fairseq/fairseq/tasks/fairseq_task.py", line 428, in train_step
      loss, sample_size, logging_output = criterion(model, sample)
    File "<path to venv>/venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
    File "<path to fairseq>/fairseq/fairseq/criterions/label_smoothed_cross_entropy.py", line 69, in forward
      net_output = model(**sample["net_input"])
    File "<path to venv>/venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
    File "<path to venv>/venv/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 619, in forward
      output = self.module(*inputs[0], **kwargs[0])
    File "<path to venv>/venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
    File "<path to fairseq>/fairseq/fairseq/models/speech_to_text/s2t_transformer.py", line 259, in forward
      decoder_out = self.decoder(
    File "<path to venv>/venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
    File "<path to fairseq>/fairseq/fairseq/models/transformer.py", line 693, in forward
      x, extra = self.extract_features(
    File "<path to fairseq>/fairseq/fairseq/models/speech_to_text/s2t_transformer.py", line 381, in extract_features
    x, _ = self.extract_features_scriptable(
    File "<path to fairseq>/fairseq/fairseq/models/transformer.py", line 810, in extract_features_scriptable
      if (encoder_out is not None and len(encoder_out["encoder_out"]) > 0)
  TypeError: tuple indices must be integers or slices, not str

  /usr/lib/python3.8/multiprocessing/resource_tracker.py:216: UserWarning: resource_tracker: There appear to be 8 leaked semaphore objects to clean up at shutdown
    warnings.warn('resource_tracker: There appear to be %d '

</div>

Issue Analytics

State:
Created 3 years ago
Reactions:1
Comments:22 (11 by maintainers)

Top GitHub Comments

1reaction

sarapapicommented, Jan 7, 2021

@ZLKong did you make any changes to fairseq/fairseq/models/transformer.py (it seems to be different from the version in the master branch)?

Hi @kahne, I don’t think I changed anything, please check this issue that I posted. #2983 (comment) The transformer.py code that I used is this https://github.com/pytorch/fairseq/blob/0db28cdd0e50cad9c36e5e47ffceff40beaf6f60/fairseq/models/transformer.py#L807-L810

I see. encoder_out was a NamedTuple but recently reverted back to a dictionary. However, the s2t_transformer still uses NamedTuple as the return type: https://github.com/pytorch/fairseq/blob/0db28cdd0e50cad9c36e5e47ffceff40beaf6f60/fairseq/models/speech_to_text/s2t_transformer.py#L317

. It needs to be updated to dict as well.

@kahne Thank you very much. How can I change return EncoderOut( to dict?
return {
           "encoder_out": new_encoder_out,  # T x B x C
           "encoder_padding_mask": new_encoder_padding_mask,  # B x T
           "encoder_embedding": new_encoder_embedding,  # B x T x C
           "encoder_states": encoder_states,  # List[T x B x C]
           "src_tokens": None,
           "src_lengths": None,
        }
@kahne Thanks!
@ZLKong did you make any changes to fairseq/fairseq/models/transformer.py (it seems to be different from the version in the master branch)?

Hi @kahne, I don’t think I changed anything, please check this issue that I posted. #2983 (comment) The transformer.py code that I used is this https://github.com/pytorch/fairseq/blob/0db28cdd0e50cad9c36e5e47ffceff40beaf6f60/fairseq/models/transformer.py#L807-L810

I see. encoder_out was a NamedTuple but recently reverted back to a dictionary. However, the s2t_transformer still uses NamedTuple as the return type: https://github.com/pytorch/fairseq/blob/0db28cdd0e50cad9c36e5e47ffceff40beaf6f60/fairseq/models/speech_to_text/s2t_transformer.py#L317

. It needs to be updated to dict as well.

@kahne Thank you very much. How can I change return EncoderOut( to dict?
return {
           "encoder_out": new_encoder_out,  # T x B x C
           "encoder_padding_mask": new_encoder_padding_mask,  # B x T
           "encoder_embedding": new_encoder_embedding,  # B x T x C
           "encoder_states": encoder_states,  # List[T x B x C]
           "src_tokens": None,
           "src_lengths": None,
        }
Hi, I had the same issue and I’ve replaced the code with the one above but I get this error: epoch 001: 0%| | 0/562 [00:00<?, ?it/s]2020-12-16 11:06:29 | INFO | fairseq.trainer | begin training epoch 1 Traceback (most recent call last): File "/home/sarapapi/anaconda3/envs/nlp_venv/bin/fairseq-train", line 33, in <module> sys.exit(load_entry_point('fairseq', 'console_scripts', 'fairseq-train')()) File "/mnt/c/Users/sarap/OneDrive/Desktop/Dottorato/End2End/fairseq/fairseq_cli/train.py", line 413, in cli_main distributed_utils.call_main(cfg, main) File "/mnt/c/Users/sarap/OneDrive/Desktop/Dottorato/End2End/fairseq/fairseq/distributed_utils.py", line 336, in call_main main(cfg, **kwargs) File "/mnt/c/Users/sarap/OneDrive/Desktop/Dottorato/End2End/fairseq/fairseq_cli/train.py", line 138, in main valid_losses, should_stop = train(cfg, trainer, task, epoch_itr) File "/home/sarapapi/anaconda3/envs/nlp_venv/lib/python3.8/contextlib.py", line 75, in inner return func(*args, **kwds) File "/mnt/c/Users/sarap/OneDrive/Desktop/Dottorato/End2End/fairseq/fairseq_cli/train.py", line 235, in train log_output = trainer.train_step(samples) File "/home/sarapapi/anaconda3/envs/nlp_venv/lib/python3.8/contextlib.py", line 75, in inner return func(*args, **kwds) File "/mnt/c/Users/sarap/OneDrive/Desktop/Dottorato/End2End/fairseq/fairseq/trainer.py", line 530, in train_step loss, sample_size_i, logging_output = self.task.train_step( File "/mnt/c/Users/sarap/OneDrive/Desktop/Dottorato/End2End/fairseq/fairseq/tasks/fairseq_task.py", line 430, in train_step loss, sample_size, logging_output = criterion(model, sample) File "/home/sarapapi/anaconda3/envs/nlp_venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, **kwargs) File "/mnt/c/Users/sarap/OneDrive/Desktop/Dottorato/End2End/fairseq/fairseq/criterions/label_smoothed_cross_entropy.py", line 69, in forward net_output = model(**sample["net_input"]) File "/home/sarapapi/anaconda3/envs/nlp_venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, **kwargs) File "/mnt/c/Users/sarap/OneDrive/Desktop/Dottorato/End2End/fairseq/fairseq/models/speech_to_text/s2t_transformer.py", line 259, in forward decoder_out = self.decoder( File "/home/sarapapi/anaconda3/envs/nlp_venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, **kwargs) File "/mnt/c/Users/sarap/OneDrive/Desktop/Dottorato/End2End/fairseq/fairseq/models/transformer.py", line 706, in forward x, extra = self.extract_features( File "/mnt/c/Users/sarap/OneDrive/Desktop/Dottorato/End2End/fairseq/fairseq/models/speech_to_text/s2t_transformer.py", line 381, in extract_features x, _ = self.extract_features_scriptable( File "/mnt/c/Users/sarap/OneDrive/Desktop/Dottorato/End2End/fairseq/fairseq/models/transformer.py", line 820, in extract_features_scriptable x, layer_attn, _ = layer( File "/home/sarapapi/anaconda3/envs/nlp_venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, **kwargs) File "/mnt/c/Users/sarap/OneDrive/Desktop/Dottorato/End2End/fairseq/fairseq/modules/transformer_layer.py", line 373, in forward x, attn = self.encoder_attn( File "/home/sarapapi/anaconda3/envs/nlp_venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, **kwargs) File "/mnt/c/Users/sarap/OneDrive/Desktop/Dottorato/End2End/fairseq/fairseq/modules/multihead_attention.py", line 303, in forward assert key_padding_mask.size(0) == bsz AssertionError Is it somewhat related? Thanks
@sarapapi Can you try again with the latest master branch? I will close this issue for now. Please feel free to open a new one if this is still not resolved.

All solved, thank you!

1reaction

kahnecommented, Jan 6, 2021

The training itself was successful, but I could not reproduce the original scores. For example, my WER for En ASR is 32.29, not 25.6. Do you have any idea why this is? Here are my scripts: Training command Evaluation commands config.yaml

@cromz22 Thanks for reporting the issue. This is caused by a bug in the latest Hydra configuration system — the arguments --wer-tokenizer 13a --wer-lowercase --wer-remove-punct are not passed into the WER scorer properly (higher WER then without punctuation removal and lowercasing). I will make a fix shortly and let you know.

Please pull the latest master branch for the bug fix.

Top Results From Across the Web

Errors while replicating Example 3: ST on CoVoST · Issue #2971

Run fairseq-train command again, training will start successfully. Expected behavior. All commands finishes without any error. Environment.

Mplus Discussion >> Errors for replication

Hello, I run an ESEM model (WLSMV estimator) using multiple imputation in Mplus version 5.21. All the requested replications were completed ...

CoVoST Dataset - Papers With Code

CoVoST is a large-scale multilingual speech-to-text translation corpus. Its latest 2nd version covers translations from 21 languages into English and from ...

Towards Robust End-to-End Speech Translation - UPCommons

In this section, I describe the main ST datasets that are available up until today. I am presenting them in reverse chronological order,...

Introducing CVSS: A Massively Multilingual Speech-to ...

Automatic translation of speech from one language to speech in another ... CVSS is directly derived from the CoVoST 2 speech-to-text (ST) ...