Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Exploding in WMT14 en-fr

See original GitHub issue

Hello. I’ve processed my data and set training parameters as the same in pre-trained models/wmt14.en-fr.fconv-py/README.md. However, I get

| [en] dictionary: 43881 types
| [fr] dictionary: 43978 types
| data-bin train 35482842 examples
| data-bin valid 26663 examples
| data-bin test 3003 examples
| using 8 GPUs (with max tokens per GPU = 4000)
| model fconv_wmt_en_fr
Warning! 1 samples are either too short or too long and will be ignored, sample ids=[28743556]
| epoch 001  1000 / 331737 loss=9.57 (10.94), wps=18515, wpb=31259, bsz=861, lr=1.25, clip=100%, gnorm=2.0540
| epoch 001  2000 / 331737 loss=8.61 (9.91), wps=18466, wpb=31229, bsz=877, lr=1.25, clip=100%, gnorm=1.7149
| epoch 001  3000 / 331737 loss=7.50 (9.23), wps=18493, wpb=31226, bsz=871, lr=1.25, clip=100%, gnorm=2.7501
| epoch 001  4000 / 331737 loss=6.87 (8.75), wps=18522, wpb=31231, bsz=873, lr=1.25, clip=100%, gnorm=100615.8788
| epoch 001  5000 / 331737 loss=10405.01 (136.96), wps=18532, wpb=31216, bsz=874, lr=1.25, clip=100%, gnorm=1500459828271.3960
| epoch 001  6000 / 331737 loss=4773454961.36 (92926125.94), wps=18564, wpb=31213, bsz=867, lr=1.25, clip=100%, gnorm=37459419138681.4219
| epoch 001  7000 / 331737 loss=7746569234820.15 (126329286789.38), wps=18577, wpb=31211, bsz=864, lr=1.25, clip=100%, gnorm=inf
| epoch 001  8000 / 331737 loss=18016233617.10 (228909462625.55), wps=18562, wpb=31205, bsz=866, lr=1.25, clip=100%, gnorm=inf
| epoch 001  9000 / 331737 loss=6500325670920.53 (321325856038.58), wps=18597, wpb=31214, bsz=860, lr=1.25, clip=100%, gnorm=inf
| epoch 001 10000 / 331737 loss=11162501170786.86 (715142464195.40), wps=18609, wpb=31219, bsz=858, lr=1.25, clip=100%, gnorm=inf
....

--------------------------ENV---------------------------- P40 8cards

--------------------------DATA PREPROCESSING----------------------------

normalize-punctuation
tokenizer
clean-corpus-n
shuffle
learn and apply bpe I’ve checked en-fr data corresponding relationship after preprocessing.

--------------------------TRAINING　PARAMETER---------------------------- fairseq_train_param=“-s en -t fr --arch fconv_wmt_en_fr
–dropout 0.1 --lr 1.25 --clip-norm 0.1 --max-tokens 4000 --force-anneal 32”

Can you help me to figure out my problem? Thank you.

Issue Analytics

State:
Created 6 years ago
Comments:20 (20 by maintainers)

Top GitHub Comments

1reaction

Zrachelcommented, Jan 2, 2018

Hi @edunov , I found some difference in our reported results:

Your reported result above is in the format of:

checkpoint1.pt: | Generate test with beam=5: BLEU4 = 35.91, 64.4/42.0/29.4/20.9 (BP=1.000, ratio=1.000, syslen=81191, reflen=81194)

which might be directly calculated by generate.py.

While, mine is in the format of

checkpoint1.pt/test.bleu:BLEU4 = 30.11, 59.5/35.8/23.8/16.2 (BP=1.000, ratio=0.975, syslen=83264, reflen=81204)

It is achieved in this way:

python generate.py $data --path $path > $out/test.out
grep ^H $out/test.out | sed 's/^H-//g' | sort -k1,1n | cut -f 3 | sed 's/@@ //g' > $out/test.trans
grep ^T $out/test.out | sed 's/^T-//g' | sort -k1,1n | cut -f 2 | sed 's/@@ //g' > $out/test.ref
python ../../PyFairseq/score.py --sys $out/test.trans --ref $out/test.ref > $out/test.bleu

Can you paste your result of score.py for reference? Thank you.

1reaction

myleottcommented, Nov 12, 2017

We recently discovered an issue with recent versions of PyTorch and our multi-GPU training code. The fix is here: https://github.com/facebookresearch/fairseq-py/commit/d7d82715f968097bba08c92416d332d969bd1f06. Can you update your fairseq-py or apply the fix and see if it solves the exploding loss issue?

Top Results From Across the Web

An In-depth Walkthrough on Evolution of Neural Machine ...

WMT'14 English to French dataset. ... avoid explosion of values after the dot-product. The ... WMT'14 En-Fr and En-De. other datasets used.

Improving Deep Transformer with Depth-Scaled Initialization and ...

One common problem for the training of deep neural models are vanishing or exploding gradients. Existing methods mainly focus on developing novel network ......

What Works and Doesn't Work, A Deep Decoder for Neural ...

popular machine translation benchmarks: WMT14 ... exploding. Zhang et al. ... translation models on WMT14 En→De and En→Fr tasks.

What Works and Doesn't Work, A Deep Decoder for Neural ...

popular machine translation benchmarks: WMT14 ... exploding. Zhang et al. (2019) proposed a depth- ... translation models on WMT14 En→De and En→Fr tasks....

Convolutional Sequence to Sequence Learning

The notorious vanishing/exploding gradients problem ... Translation speed on English-French (WMT'14, dev set). ConvS2S: Speed ... WMT14 EN-FR.