Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Generating translation using the 2021 IWSLT multilingual speech translation model

See original GitHub issue

🐛 Bug

I am trying to use the model that you shared here to generate translations for the speech that I have. But I am getting this error:

Traceback (most recent call last): File “./fairseq/fairseq_cli/generate.py”, line 414, in <module> cli_main() File “./fairseq/fairseq_cli/generate.py”, line 402, in cli_main parser = options.get_generation_parser() File “./fairseq/fairseq/options.py”, line 49, in get_generation_parser parser = get_parser(“Generation”, default_task) File “./fairseq/fairseq/options.py”, line 219, in get_parser utils.import_user_module(usr_args) File “./fairseq/fairseq/utils.py”, line 489, in import_user_module import_tasks(tasks_path, f"{module_name}.tasks") File “./fairseq/fairseq/tasks/init.py”, line 117, in import_tasks importlib.import_module(namespace + “.” + task_name) File “/home/anaconda3/lib/python3.6/importlib/init.py”, line 126, in import_module return _bootstrap._gcd_import(name[level:], package, level) File “<frozen importlib._bootstrap>”, line 994, in _gcd_import File “<frozen importlib._bootstrap>”, line 971, in _find_and_load File “<frozen importlib._bootstrap>”, line 955, in _find_and_load_unlocked File “<frozen importlib._bootstrap>”, line 665, in _load_unlocked File “<frozen importlib._bootstrap_external>”, line 678, in exec_module File “<frozen importlib._bootstrap>”, line 219, in _call_with_frames_removed File “./fairseq/examples/speech_text_joint_to_text/tasks/speech_text_joint.py”, line 38, in <module> class SpeechTextJointToTextTask(SpeechToTextTask): File “./fairseq/fairseq/tasks/init.py”, line 71, in register_task_cls raise ValueError(“Cannot register duplicate task ({})”.format(name)) ValueError: Cannot register duplicate task (speech_text_joint_to_text)

To Reproduce

I am using the same script that is shared here for evaluation.

python ./fairseq/fairseq_cli/generate.py
   ${MANIFEST_ROOT} \
   --task speech_text_joint_to_text \
   --user-dir ./fairseq/examples/speech_text_joint_to_text \
   --load-speech-only  --gen-subset  test_es_en_tedx \
   --path  ${model}  \
   --max-source-positions 800000 \
   --skip-invalid-size-inputs-valid-test \
   --config-yaml config.yaml \
   --infer-target-lang en  \
   --max-tokens 800000 \
   --beam 5 \
   --results-path ${RESULTS_DIR}  \
   --scoring sacrebleu

Issue Analytics

State:
Created 2 years ago
Comments:6

Top GitHub Comments

1reaction

atebbifakhrcommented, Sep 27, 2021

This problem happens because of an issue in the preprocessing step. The wav2vec encoder receives the audio instead of the filterbanks. With the new preprocessing script in the latest commit this issue has been resolved.

0reactions

atebbifakhrcommented, Sep 27, 2021

The next problem that I am encountering now is this one:

| src dictionary: 64007 types | tgt dictionary: 64007 types 2021-09-17 20:54:50 | INFO | fairseq_cli.generate | loading model(s) from ${MANIFEST_ROOT}/checkpoint17.pt 2021-09-17 20:55:19 | INFO | fairseq.tasks.speech_to_text | pre-tokenizer: {‘tokenizer’: None} 2021-09-17 20:55:19 | INFO | fairseq.tasks.speech_to_text | tokenizer: {‘bpe’: ‘sentencepiece’, ‘sentencepiece_model’: ‘${MANIFEST_ROOT/spm.model’} 2021-09-17 20:55:19 | INFO | speech_text_joint_to_text.tasks.speech_text_joint_to_text | src-pre-tokenizer: {‘tokenizer’: None} 2021-09-17 20:55:19 | INFO | speech_text_joint_to_text.tasks.speech_text_joint_to_text | tokenizer: {‘bpe’: None} 2021-09-17 20:55:20 | INFO | fairseq.data.audio.speech_to_text_dataset | ‘train’ has 0.00% OOV 2021-09-17 20:55:20 | INFO | fairseq.data.audio.speech_to_text_dataset | SpeechToTextJointDataset(split=“train”, n_samples=498, prepend_tgt_lang_tag=False, sh uffle=False, transforms=CompositeAudioFeatureTransform( UtteranceCMVN(norm_means=True, norm_vars=True) SpecAugmentTransform(time_warp_w=0, freq_mask_n=2, freq_mask_f=27, time_mask_n=2, time_mask_t=100, time_mask_p=1.0) ), n_frames_per_step=1 0%| | 0/3 [00:00<?, ?it/s] 2021-09-17 20:55:22 | INFO | fairseq.tasks.speech_to_text | pre-tokenizer: {‘tokenizer’: None} 2021-09-17 20:55:22 | INFO | fairseq.tasks.speech_to_text | tokenizer: {‘bpe’: ‘sentencepiece’, ‘sentencepiece_model’: ‘${MANIFEST_ROOT}/spm.model’} Traceback (most recent call last): File “./fairseq/fairseq_cli/generate.py”, line 414, in <module> cli_main() File “./fairseq/fairseq_cli/generate.py”, line 410, in cli_main main(args) File “./fairseq/fairseq_cli/generate.py”, line 47, in main return _main(cfg, h) File “./fairseq/fairseq_cli/generate.py”, line 206, in _main constraints=constraints, File “.fairseq/examples/speech_text_joint_to_text/tasks/speech_text_joint.py”, line 220, in inference_step bos_token=self._infer_tgt_lang_id, File “./venv/lib/python3.6/site-packages/torch/autograd/grad_mode.py”, line 28, in decorate_context return func(*args, **kwargs) File “./fairseq/fairseq/sequence_generator.py”, line 187, in generate return self._generate(sample, **kwargs) File “./fairseq/fairseq/sequence_generator.py”, line 254, in _generate encoder_outs = self.model.forward_encoder(net_input) File “./fairseq/fairseq/sequence_generator.py”, line 760, in forward_encoder return [model.encoder.forward_torchscript(net_input) for model in self.models] File “./fairseq/fairseq/sequence_generator.py”, line 760, in <listcomp> return [model.encoder.forward_torchscript(net_input) for model in self.models] File “./fairseq/fairseq/models/fairseq_encoder.py”, line 55, in forward_torchscript return self.forward_non_torchscript(net_input) File “./fairseq/fairseq/models/fairseq_encoder.py”, line 62, in forward_non_torchscript return self.forward(**encoder_input) File “./fairseq/examples/speech_text_joint_to_text/models/s2t_dualinputtransformer.py”, line 320, in forward src_tokens, src_lengths, return_all_hiddens=return_all_hiddens File “./venv/lib/python3.6/site-packages/torch/nn/modules/module.py”, line 1051, in _call_impl return forward_call(*input, **kwargs) File “./fairseq/examples/speech_text_joint_to_text/models/s2t_dualinputxmtransformer.py”, line 152, in forward if out[“encoder_padding_mask”] is not None: KeyError: ‘encoder_padding_mask’

Top Results From Across the Web

Multilingual Speech Translation - IWSLT

Multilingual models enable transfer from related tasks, which is particularly important for low-resource languages; however, parallel data between two otherwise ...

Generating translation using the 2021 IWSLT ... - GitHub

Bug I am trying to use the model that you shared here to generate translations for the speech that I have. But I...

ZJU's IWSLT 2021 Speech Translation System - ACL Anthology

This task focuses on speech translation (ST) research across many non-English source languages. Participants can decide whether to work on constrained systems ...

ON-TRAC' systems for the IWSLT 2021 low-resource speech ...

tion tasks, and a neural end-to-end model for the multilingual speech translation task. For the low resource task, we investigated the use ......

UPC's Speech Translation System for IWSLT 2021 | Semantic Scholar

This work focuses on code switching in the context of English/Spanish conversations for the task of speech translation (ST), generating and evaluating both ......