question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Segmentation fault when training speech_to_text model following instruction in examples/speech_to_text

See original GitHub issue

🐛 Bug

I have followed the README in examples/speech_to_text to reproduce results of ST on MUSTC. But when I start training (after preprocess following the instruction), the system raises segmentation fault error, just after reading dev subset.

Below is part of messages:

2020-10-23 18:09:18 | INFO | fairseq.tasks.speech_to_text | dictionary size (spm_bpe10000_st.txt): 10,000
2020-10-23 18:09:18 | INFO | fairseq.tasks.speech_to_text | pre-tokenizer: {'tokenizer': None}
2020-10-23 18:09:18 | INFO | fairseq.tasks.speech_to_text | tokenizer: {'bpe': 'sentencepiece', 'sentencepiece_model': '/home/ma-user/work/data/mustc-s2t/en-de/spm_bpe10000_st.model'}
2020-10-23 18:09:18 | INFO | fairseq.data.audio.speech_to_text_dataset | SpeechToTextDataset(split="valid_st", n_samples=1388, prepend_tgt_lang_tag=False, shuffle=False, transforms=None)
mustc-test-s2t-cd.sh: line 11: 33140 Segmentation fault      CUDA_VISIBLE_DEVICES=0 python fairseq_cli/train.py ${data_dir} --config-yaml config_st.yaml --train-subset train_st --valid-subset valid_st --save-dir ${model_dir} --num-workers 1 --max-tokens 20000 --task speech_to_text --criterion label_smoothed_cross_entropy --label-smoothing 0.1 --max-update 100000 --arch s2t_transformer_s --optimizer adam --lr 2e-3 --lr-scheduler inverse_sqrt --warmup-updates 10000 --clip-norm 10.0 --seed 1

To Reproduce

CUDA_VISIBLE_DEVICES=0 python fairseq_cli/train.py ${data_dir} --config-yaml config_st.yaml --train-subset train_st --valid-subset valid_st --save-dir ${model_dir} --num-workers 1 --max-tokens 20000 --task speech_to_text --criterion label_smoothed_cross_entropy --label-smoothing 0.1 --max-update 100000 --arch s2t_transformer_s --optimizer adam --lr 2e-3 --lr-scheduler inverse_sqrt --warmup-updates 10000 --clip-norm 10.0 --seed 1

Code sample

Expected behavior

Environment

  • fairseq Version (e.g., 1.0 or master): 1.0
  • PyTorch Version (e.g., 1.0): 1.4.0
  • OS (e.g., Linux): Linux
  • How you installed fairseq (pip, source): source
  • Build command you used (if compiling from source):
  • Python version: 3.7
  • CUDA/cuDNN version:
  • GPU models and configuration:
  • Any other relevant information:

Additional context

Issue Analytics

  • State:open
  • Created 3 years ago
  • Comments:23 (6 by maintainers)

github_iconTop GitHub Comments

1reaction
zxshamsoncommented, Nov 3, 2020

@kahne I have just tried Pytorch 1.5, and it also works well.

0reactions
holymacommented, Sep 14, 2021

I have run

I have run en-de ASR in mustc with 1 GPU, as this instructions. But it still not converges, follow is my result:

| INFO | train | epoch 160 | loss 4.577 | nll_loss 3.24 | total 12454.7 | n_correct 6609.75 | ppl 9.45 | accuracy 53.07 | wps 4776.8 | ups 0.38 | wpb 12454.7 | bsz 455 | num_updates 78241 | lr 0.000357506 | gnorm 0.394 | clip 0 | train_wall 1201 | gb_free 13.3 | wall 40551

It is normal?@kahne @zxshamson

Did you set --update-freq to 8?

yes, this is my instruction:

fairseq-train ${MUSTC_ROOT}/en-de \
  --config-yaml config_asr.yaml --train-subset train_asr --valid-subset dev_asr \
  --save-dir ${CHECKPOINT}/mustc_asr --num-workers 4 --max-tokens 40000 --max-update 100000 \
  --task speech_to_text --criterion label_smoothed_cross_entropy --label-smoothing 0.1 --report-accuracy --fp16\
  --arch s2t_transformer_s --optimizer adam --lr 1e-3 --lr-scheduler inverse_sqrt \
  --warmup-updates 10000 --clip-norm 10.0 --seed 1 --update-freq 8  --tensorboard-logdir $LOG/mustc_asr | tee  $LOG/mustc_asr/train_7.log

I have downloaded your pretrained model, it runs well.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Segmentation fault when training speech_to_text model ...
Bug I have followed the README in examples/speech_to_text to reproduce results of ST on MUSTC. But when I start training (after preprocess ...
Read more >
Identify what's causing segmentation faults (segfaults)
A segmentation fault (aka segfault) is a common condition that causes programs to crash; they are often associated with a file named core...
Read more >
Sporadic core dumps (Segmentation faults, Illegal Instruction ...
I frequently get core dumps (Illegal instruction, Segmentation fault) when training models using tensorflow. They are somewhat sporadic, ...
Read more >
Segmentation Fault - Notebook - Jupyter Community Forum
I am using a Supercomputer at our organization for training an AI model. The supercomputer has 4 GPU nodes since I am a...
Read more >
Debugging PHP Segmentation Faults | Zend by Perforce
It is possible to instruct apache or FastCGI to have only one process. Once this is done we can run the following command...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found