Exploding in WMT14 en-fr
See original GitHub issueHello. I’ve processed my data and set training parameters as the same in pre-trained models/wmt14.en-fr.fconv-py/README.md. However, I get
| [en] dictionary: 43881 types
| [fr] dictionary: 43978 types
| data-bin train 35482842 examples
| data-bin valid 26663 examples
| data-bin test 3003 examples
| using 8 GPUs (with max tokens per GPU = 4000)
| model fconv_wmt_en_fr
Warning! 1 samples are either too short or too long and will be ignored, sample ids=[28743556]
| epoch 001 1000 / 331737 loss=9.57 (10.94), wps=18515, wpb=31259, bsz=861, lr=1.25, clip=100%, gnorm=2.0540
| epoch 001 2000 / 331737 loss=8.61 (9.91), wps=18466, wpb=31229, bsz=877, lr=1.25, clip=100%, gnorm=1.7149
| epoch 001 3000 / 331737 loss=7.50 (9.23), wps=18493, wpb=31226, bsz=871, lr=1.25, clip=100%, gnorm=2.7501
| epoch 001 4000 / 331737 loss=6.87 (8.75), wps=18522, wpb=31231, bsz=873, lr=1.25, clip=100%, gnorm=100615.8788
| epoch 001 5000 / 331737 loss=10405.01 (136.96), wps=18532, wpb=31216, bsz=874, lr=1.25, clip=100%, gnorm=1500459828271.3960
| epoch 001 6000 / 331737 loss=4773454961.36 (92926125.94), wps=18564, wpb=31213, bsz=867, lr=1.25, clip=100%, gnorm=37459419138681.4219
| epoch 001 7000 / 331737 loss=7746569234820.15 (126329286789.38), wps=18577, wpb=31211, bsz=864, lr=1.25, clip=100%, gnorm=inf
| epoch 001 8000 / 331737 loss=18016233617.10 (228909462625.55), wps=18562, wpb=31205, bsz=866, lr=1.25, clip=100%, gnorm=inf
| epoch 001 9000 / 331737 loss=6500325670920.53 (321325856038.58), wps=18597, wpb=31214, bsz=860, lr=1.25, clip=100%, gnorm=inf
| epoch 001 10000 / 331737 loss=11162501170786.86 (715142464195.40), wps=18609, wpb=31219, bsz=858, lr=1.25, clip=100%, gnorm=inf
....
--------------------------ENV---------------------------- P40 8cards
--------------------------DATA PREPROCESSING----------------------------
- normalize-punctuation
- tokenizer
- clean-corpus-n
- shuffle
- learn and apply bpe I’ve checked en-fr data corresponding relationship after preprocessing.
--------------------------TRAINING PARAMETER----------------------------
fairseq_train_param=“-s en -t fr --arch fconv_wmt_en_fr
–dropout 0.1 --lr 1.25 --clip-norm 0.1 --max-tokens 4000 --force-anneal 32”
Can you help me to figure out my problem? Thank you.
Issue Analytics
- State:
- Created 6 years ago
- Comments:20 (20 by maintainers)
Top Results From Across the Web
An In-depth Walkthrough on Evolution of Neural Machine ...
WMT'14 English to French dataset. ... avoid explosion of values after the dot-product. The ... WMT'14 En-Fr and En-De. other datasets used.
Read more >Improving Deep Transformer with Depth-Scaled Initialization and ...
One common problem for the training of deep neural models are vanishing or exploding gradients. Existing methods mainly focus on developing novel network ......
Read more >What Works and Doesn't Work, A Deep Decoder for Neural ...
popular machine translation benchmarks: WMT14 ... exploding. Zhang et al. ... translation models on WMT14 En→De and En→Fr tasks.
Read more >What Works and Doesn't Work, A Deep Decoder for Neural ...
popular machine translation benchmarks: WMT14 ... exploding. Zhang et al. (2019) proposed a depth- ... translation models on WMT14 En→De and En→Fr tasks....
Read more >Convolutional Sequence to Sequence Learning
The notorious vanishing/exploding gradients problem ... Translation speed on English-French (WMT'14, dev set). ConvS2S: Speed ... WMT14 EN-FR.
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Hi @edunov , I found some difference in our reported results:
Your reported result above is in the format of:
which might be directly calculated by
generate.py
.While, mine is in the format of
It is achieved in this way:
Can you paste your result of
score.py
for reference? Thank you.We recently discovered an issue with recent versions of PyTorch and our multi-GPU training code. The fix is here: https://github.com/facebookresearch/fairseq-py/commit/d7d82715f968097bba08c92416d332d969bd1f06. Can you update your fairseq-py or apply the fix and see if it solves the exploding loss issue?