Reproducing result on WMT14' en-fr
See original GitHub issueFollowing the latest code with training parameter specified by @edunov in https://github.com/facebookresearch/fairseq-py/issues/41 and Readme.md
of Pretrained-models
, I got exploding update on WMT14 en-fr:
+ miniconda3/bin/python3 PyFairseq/train.py data-bin --save-dir model -s en -t fr --arch fconv_wmt_en_fr --dropout 0.1 --lr 2.5 --clip-norm 0.1 --max-tokens 4000 --force-anneal 32
Namespace(adam_betas='(0.9, 0.999)', arch='fconv_wmt_en_fr', clip_norm=0.1, curriculum=0, data='data-bin', decoder_attention='True', decoder_embed_dim=768, decoder_layers='[(512, 3)] * 6 + [(768, 3)] * 4 + [(1024, 3)] * 3 + [(2048, 1)] * 1 + [(4096, 1)] * 1', decoder_out_embed_dim=512, dropout=0.1, encoder_embed_dim=768, encoder_layers='[(512, 3)] * 6 + [(768, 3)] * 4 + [(1024, 3)] * 3 + [(2048, 1)] * 1 + [(4096, 1)] * 1', force_anneal=32, label_smoothing=0, log_format=None, log_interval=1000, lr='2.5', lrshrink=0.1, max_epoch=0, max_sentences=None, max_source_positions=1024, max_target_positions=1024, max_tokens=4000, min_lr=1e-05, model='fconv', momentum=0.99, no_epoch_checkpoints=False, no_progress_bar=False, no_save=False, num_gpus=8, optimizer='nag', restore_file='checkpoint_last.pt', sample_without_replacement=0, save_dir='model', save_interval=-1, seed=1, sentence_avg=False, skip_invalid_size_inputs_valid_test=False, source_lang='en', target_lang='fr', train_subset='train', valid_subset='valid', weight_decay=0.0, workers=1)
| [en] dictionary: 43881 types
| [fr] dictionary: 43978 types
| data-bin train 35482842 examples
| data-bin valid 26663 examples
| using 8 GPUs (with max tokens per GPU = 4000 and max sentences per GPU = None)
| model fconv_wmt_en_fr, criterion CrossEntropyCriterion
Warning! 1 samples are either too short or too long and will be ignored, first few sample ids=[28743556]
| epoch 001: 1000 / 331737 loss=9.21 (10.89), wps=16319, wpb=31291, bsz=850, lr=2.5, clip=100%, gnorm=2.50713, oom=0
| epoch 001: 2000 / 331737 loss=588.92 (19.76), wps=16417, wpb=31241, bsz=838, lr=2.5, clip=100%, gnorm=5.39344e+09, oom=0
| epoch 001: 3000 / 331737 loss=126867869305.41 (3395251823.97), wps=16436, wpb=31258, bsz=849, lr=2.5, clip=100%, gnorm=2.05028e+16, oom=0
| epoch 001: 4000 / 331737 loss=137727644131954352.00 (3821157344375131.00), wps=16438, wpb=31229, bsz=853, lr=2.5, clip=100%, gnorm=inf, oom=0
| epoch 001: 5000 / 331737 loss=358248860949876800.00 (64219013624718560.00), wps=16454, wpb=31251, bsz=861, lr=2.5, clip=100%, gnorm=inf, oom=0
| epoch 001: 6000 / 331737 loss=74803270219822464.00 (85287362140370208.00), wps=16464, wpb=31255, bsz=857, lr=2.5, clip=100%, gnorm=inf, oom=0
| epoch 001: 7000 / 331737 loss=1124810776667683.12 (75791177781467504.00), wps=16478, wpb=31266, bsz=854, lr=2.5, clip=100%, gnorm=inf, oom=0
| epoch 001: 8000 / 331737 loss=nan (nan), wps=16486, wpb=31252, bsz=852, lr=2.5, clip=94%, gnorm=nan, oom=0
| epoch 001: 9000 / 331737 loss=nan (nan), wps=16493, wpb=31241, bsz=852, lr=2.5, clip=83%, gnorm=nan, oom=0
| epoch 001: 10000 / 331737 loss=nan (nan), wps=16502, wpb=31244, bsz=855, lr=2.5, clip=75%, gnorm=nan, oom=0
| epoch 001: 11000 / 331737 loss=nan (nan), wps=16511, wpb=31239, bsz=855, lr=2.5, clip=68%, gnorm=nan, oom=0
| epoch 001: 12000 / 331737 loss=nan (nan), wps=16521, wpb=31240, bsz=855, lr=2.5, clip=62%, gnorm=nan, oom=0
| epoch 001: 13000 / 331737 loss=nan (nan), wps=16529, wpb=31244, bsz=853, lr=2.5, clip=58%, gnorm=nan, oom=0
| epoch 001: 14000 / 331737 loss=nan (nan), wps=16536, wpb=31239, bsz=851, lr=2.5, clip=53%, gnorm=nan, oom=0
| epoch 001: 15000 / 331737 loss=nan (nan), wps=16539, wpb=31236, bsz=852, lr=2.5, clip=50%, gnorm=nan, oom=0
Only change the learning rate to 1.25 would not trigger the exploding problem, but BLEU increases very slow:
checkpoint1.pt/test.bleu:BLEU4 = 30.11, 59.5/35.8/23.8/16.2 (BP=1.000, ratio=0.975, syslen=83264, reflen=81204)
checkpoint2.pt/test.bleu:BLEU4 = 31.34, 60.4/37.1/25.0/17.2 (BP=1.000, ratio=0.986, syslen=82348, reflen=81204)
checkpoint3.pt/test.bleu:BLEU4 = 32.56, 61.4/38.4/26.1/18.2 (BP=1.000, ratio=0.988, syslen=82230, reflen=81204)
checkpoint4.pt/test.bleu:BLEU4 = 32.71, 61.5/38.5/26.3/18.4 (BP=1.000, ratio=0.989, syslen=82140, reflen=81204)
checkpoint5.pt/test.bleu:BLEU4 = 33.13, 62.0/38.9/26.7/18.7 (BP=1.000, ratio=0.997, syslen=81437, reflen=81204)
checkpoint6.pt/test.bleu:BLEU4 = 33.04, 61.5/38.8/26.7/18.7 (BP=1.000, ratio=0.995, syslen=81632, reflen=81204)
checkpoint7.pt/test.bleu:BLEU4 = 33.01, 61.6/38.8/26.6/18.7 (BP=1.000, ratio=0.987, syslen=82282, reflen=81204)
checkpoint8.pt/test.bleu:BLEU4 = 33.60, 62.2/39.4/27.2/19.1 (BP=1.000, ratio=0.992, syslen=81830, reflen=81204)
checkpoint9.pt/test.bleu:BLEU4 = 33.07, 61.6/38.9/26.7/18.7 (BP=1.000, ratio=0.993, syslen=81783, reflen=81204)
checkpoint10.pt/test.bleu:BLEU4 = 33.39, 62.2/39.3/27.0/19.0 (BP=0.999, ratio=1.001, syslen=81099, reflen=81204)
checkpoint11.pt/test.bleu:BLEU4 = 33.74, 62.5/39.6/27.3/19.2 (BP=1.000, ratio=0.993, syslen=81744, reflen=81204)
checkpoint12.pt/test.bleu:BLEU4 = 33.37, 61.8/39.1/27.0/19.0 (BP=1.000, ratio=0.992, syslen=81892, reflen=81204)
checkpoint13.pt/test.bleu:BLEU4 = 34.07, 62.6/39.9/27.6/19.5 (BP=1.000, ratio=0.996, syslen=81534, reflen=81204)
checkpoint14.pt/test.bleu:BLEU4 = 33.81, 62.4/39.6/27.4/19.3 (BP=1.000, ratio=0.994, syslen=81685, reflen=81204)
checkpoint15.pt/test.bleu:BLEU4 = 33.78, 62.6/39.7/27.3/19.2 (BP=0.999, ratio=1.001, syslen=81110, reflen=81204)
checkpoint16.pt/test.bleu:BLEU4 = 34.09, 62.8/39.9/27.6/19.5 (BP=1.000, ratio=0.994, syslen=81723, reflen=81204)
checkpoint17.pt/test.bleu:BLEU4 = 33.94, 62.3/39.7/27.5/19.5 (BP=1.000, ratio=0.990, syslen=81988, reflen=81204)
checkpoint18.pt/test.bleu:BLEU4 = 34.43, 62.8/40.2/28.0/19.9 (BP=1.000, ratio=0.993, syslen=81811, reflen=81204)
checkpoint19.pt/test.bleu:BLEU4 = 34.14, 62.6/40.0/27.7/19.6 (BP=1.000, ratio=0.994, syslen=81661, reflen=81204)
checkpoint20.pt/test.bleu:BLEU4 = 34.05, 62.5/39.9/27.6/19.6 (BP=1.000, ratio=0.999, syslen=81314, reflen=81204)
checkpoint21.pt/test.bleu:BLEU4 = 34.20, 62.8/40.0/27.8/19.6 (BP=1.000, ratio=0.999, syslen=81259, reflen=81204)
checkpoint22.pt/test.bleu:BLEU4 = 34.13, 62.4/40.0/27.7/19.6 (BP=1.000, ratio=0.998, syslen=81331, reflen=81204)
checkpoint23.pt/test.bleu:BLEU4 = 34.31, 62.6/40.1/27.9/19.8 (BP=1.000, ratio=0.991, syslen=81972, reflen=81204)
checkpoint26.pt/test.bleu:BLEU4 = 34.11, 62.9/40.1/27.7/19.4 (BP=1.000, ratio=0.999, syslen=81260, reflen=81204)
My question is:
Is the results I got within expectation? Should I wait for the result of lr=1.25
, or there is something wrong with my data/config?
Issue Analytics
- State:
- Created 6 years ago
- Comments:13 (11 by maintainers)
Top Results From Across the Web
Reproducing result on WMT14' en-fr · Issue #85 - GitHub
My question is: Is the results I got within expectation? Should I wait for the result of lr=1.25 , or there is something...
Read more >examples/scaling_nmt/README.md · gradio/HuBERT at main
This page includes instructions for reproducing results from the paper ... .com/fairseq/models/wmt14.en-fr.joined-dict.transformer.tar.bz2) ...
Read more >Translation Task - ACL 2014 Ninth Workshop on Statistical ...
If you want to reproduce results from the campaign, use these. NEW: Cleaned Test sets (3.2 MB) These include fixes to minor encoding...
Read more >arXiv:2204.00665v1 [cs.CL] 1 Apr 2022
We propose a novel data-augmentation tech- nique for neural machine translation based on. ROT-k ciphertexts. ROT-k is a simple letter.
Read more >examples/conv_seq2seq/README.md ...
(Gehring et al., 2017), WMT14 English-French · download (.tar.bz2) ... for instructions on reproducing results for WMT'14 En-De and WMT'14 En-Fr using the ......
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
My fault. I once removed the lowercase operation for training data, but forgot to remove in test data. Thank you very much.
Result on corrected testset:
Training and validation loss:
Hi @dagarcia-nvidia , here: https://drive.google.com/open?id=1bFMhfhhMhhedPAPo0TDWBfVga8dFuTE1