Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Segmentation fault when training speech_to_text model following instruction in examples/speech_to_text

See original GitHub issue

🐛 Bug

I have followed the README in examples/speech_to_text to reproduce results of ST on MUSTC. But when I start training (after preprocess following the instruction), the system raises segmentation fault error, just after reading dev subset.

Below is part of messages:

2020-10-23 18:09:18 | INFO | fairseq.tasks.speech_to_text | dictionary size (spm_bpe10000_st.txt): 10,000
2020-10-23 18:09:18 | INFO | fairseq.tasks.speech_to_text | pre-tokenizer: {'tokenizer': None}
2020-10-23 18:09:18 | INFO | fairseq.tasks.speech_to_text | tokenizer: {'bpe': 'sentencepiece', 'sentencepiece_model': '/home/ma-user/work/data/mustc-s2t/en-de/spm_bpe10000_st.model'}
2020-10-23 18:09:18 | INFO | fairseq.data.audio.speech_to_text_dataset | SpeechToTextDataset(split="valid_st", n_samples=1388, prepend_tgt_lang_tag=False, shuffle=False, transforms=None)
mustc-test-s2t-cd.sh: line 11: 33140 Segmentation fault      CUDA_VISIBLE_DEVICES=0 python fairseq_cli/train.py ${data_dir} --config-yaml config_st.yaml --train-subset train_st --valid-subset valid_st --save-dir ${model_dir} --num-workers 1 --max-tokens 20000 --task speech_to_text --criterion label_smoothed_cross_entropy --label-smoothing 0.1 --max-update 100000 --arch s2t_transformer_s --optimizer adam --lr 2e-3 --lr-scheduler inverse_sqrt --warmup-updates 10000 --clip-norm 10.0 --seed 1

To Reproduce

CUDA_VISIBLE_DEVICES=0 python fairseq_cli/train.py ${data_dir} --config-yaml config_st.yaml --train-subset train_st --valid-subset valid_st --save-dir ${model_dir} --num-workers 1 --max-tokens 20000 --task speech_to_text --criterion label_smoothed_cross_entropy --label-smoothing 0.1 --max-update 100000 --arch s2t_transformer_s --optimizer adam --lr 2e-3 --lr-scheduler inverse_sqrt --warmup-updates 10000 --clip-norm 10.0 --seed 1

Code sample

Expected behavior

Environment

fairseq Version (e.g., 1.0 or master): 1.0
PyTorch Version (e.g., 1.0): 1.4.0
OS (e.g., Linux): Linux
How you installed fairseq (pip, source): source
Build command you used (if compiling from source):
Python version: 3.7
CUDA/cuDNN version:
GPU models and configuration:
Any other relevant information:

Additional context

Issue Analytics

State:
Created 3 years ago
Comments:23 (6 by maintainers)

Top GitHub Comments

1reaction

zxshamsoncommented, Nov 3, 2020

@kahne I have just tried Pytorch 1.5, and it also works well.

0reactions

holymacommented, Sep 14, 2021

I have run

I have run en-de ASR in mustc with 1 GPU, as this instructions. But it still not converges, follow is my result:
| INFO | train | epoch 160 | loss 4.577 | nll_loss 3.24 | total 12454.7 | n_correct 6609.75 | ppl 9.45 | accuracy 53.07 | wps 4776.8 | ups 0.38 | wpb 12454.7 | bsz 455 | num_updates 78241 | lr 0.000357506 | gnorm 0.394 | clip 0 | train_wall 1201 | gb_free 13.3 | wall 40551
It is normal？@kahne @zxshamson
Did you set --update-freq to 8?

yes, this is my instruction:

fairseq-train ${MUSTC_ROOT}/en-de \
  --config-yaml config_asr.yaml --train-subset train_asr --valid-subset dev_asr \
  --save-dir ${CHECKPOINT}/mustc_asr --num-workers 4 --max-tokens 40000 --max-update 100000 \
  --task speech_to_text --criterion label_smoothed_cross_entropy --label-smoothing 0.1 --report-accuracy --fp16\
  --arch s2t_transformer_s --optimizer adam --lr 1e-3 --lr-scheduler inverse_sqrt \
  --warmup-updates 10000 --clip-norm 10.0 --seed 1 --update-freq 8  --tensorboard-logdir $LOG/mustc_asr | tee  $LOG/mustc_asr/train_7.log

I have downloaded your pretrained model, it runs well.