Generating translation using the 2021 IWSLT multilingual speech translation model
See original GitHub issue🐛 Bug
I am trying to use the model that you shared here to generate translations for the speech that I have. But I am getting this error:
Traceback (most recent call last): File “./fairseq/fairseq_cli/generate.py”, line 414, in <module> cli_main() File “./fairseq/fairseq_cli/generate.py”, line 402, in cli_main parser = options.get_generation_parser() File “./fairseq/fairseq/options.py”, line 49, in get_generation_parser parser = get_parser(“Generation”, default_task) File “./fairseq/fairseq/options.py”, line 219, in get_parser utils.import_user_module(usr_args) File “./fairseq/fairseq/utils.py”, line 489, in import_user_module import_tasks(tasks_path, f"{module_name}.tasks") File “./fairseq/fairseq/tasks/init.py”, line 117, in import_tasks importlib.import_module(namespace + “.” + task_name) File “/home/anaconda3/lib/python3.6/importlib/init.py”, line 126, in import_module return _bootstrap._gcd_import(name[level:], package, level) File “<frozen importlib._bootstrap>”, line 994, in _gcd_import File “<frozen importlib._bootstrap>”, line 971, in _find_and_load File “<frozen importlib._bootstrap>”, line 955, in _find_and_load_unlocked File “<frozen importlib._bootstrap>”, line 665, in _load_unlocked File “<frozen importlib._bootstrap_external>”, line 678, in exec_module File “<frozen importlib._bootstrap>”, line 219, in _call_with_frames_removed File “./fairseq/examples/speech_text_joint_to_text/tasks/speech_text_joint.py”, line 38, in <module> class SpeechTextJointToTextTask(SpeechToTextTask): File “./fairseq/fairseq/tasks/init.py”, line 71, in register_task_cls raise ValueError(“Cannot register duplicate task ({})”.format(name)) ValueError: Cannot register duplicate task (speech_text_joint_to_text)
To Reproduce
I am using the same script that is shared here for evaluation.
python ./fairseq/fairseq_cli/generate.py
${MANIFEST_ROOT} \
--task speech_text_joint_to_text \
--user-dir ./fairseq/examples/speech_text_joint_to_text \
--load-speech-only --gen-subset test_es_en_tedx \
--path ${model} \
--max-source-positions 800000 \
--skip-invalid-size-inputs-valid-test \
--config-yaml config.yaml \
--infer-target-lang en \
--max-tokens 800000 \
--beam 5 \
--results-path ${RESULTS_DIR} \
--scoring sacrebleu
Issue Analytics
- State:
- Created 2 years ago
- Comments:6
Top GitHub Comments
This problem happens because of an issue in the preprocessing step. The wav2vec encoder receives the audio instead of the filterbanks. With the new preprocessing script in the latest commit this issue has been resolved.
The next problem that I am encountering now is this one: