Reproducing WMT 14 En-Fr (Transformer)
See original GitHub issueHi,
I’m trying to reproduce the WMT 14 En-Fr results from the “Scaling NMT” paper. It worked out for WMT 14 En-De with the provided preprocessing script and hyper-parameters. However, for WMT 14 En-Fr, the PPL is going up and down. My command:
python3.6 train.py data-bin/wmt14_en_fr_joined_dict --arch transformer_vaswani_wmt_en_fr_big --share-all-embeddings --optimizer Adam --adam-betas '(0.9, 0.98)' --clip-norm 0.0 --lr-scheduler inverse_sqrt --warmup-init-lr 1e-07 --warmup-updates 4000 --lr 0.001 --min-lr 1e-09 --dropout 0.1 --weight-decay 0.0 --criterion label_smoothed_cross_entropy --label-smoothing 0.1 --max-tokens 3584 --update-freq 16
Any suggestions for a better set of parameters?
Cheers, Stephan
Issue Analytics
- State:
- Created 5 years ago
- Comments:12 (4 by maintainers)
Top Results From Across the Web
Reproducing WMT 14 En-Fr (Transformer) · Issue #233 - GitHub
I'm trying to reproduce the WMT 14 En-Fr results from the "Scaling NMT" paper. It worked out for WMT 14 En-De with the...
Read more >incorporating bert into neural machine translation - arXiv
For the rich-resource scenario, we work on. WMT'14 En→De and En→Fr, whose corpus sizes are 4.5M and 36M respectively. We concate- nate ...
Read more >wmt14 · Datasets at Hugging Face
We're on a journey to advance and democratize artificial intelligence through open source and open science.
Read more >Machine Translation on WMT2014 English-French
Rank Model BLEU score SacreBLEU Year
1 Transformer+BT (ADMIN init) 46.4 44.4 2020
2 Noisy back‑translation 45.6 43.8 2018
3 mRASP+Fine‑Tune 44.3 41.7 2020
Read more >(PDF) Incorporating BERT into Neural Machine Translation
The results of WMT'14 En→De and En→Fr are shown in Table 3. Our reproduced Transformer. matches the results reported in Ott et al....
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Good news! I could reproduce your results on WMT En-Fr after switching to pytorch version v0.4.1 (43.1 BLEU on newstest14).
However, OOM is still around 0.10. To avoid that I tried a batch size of 4096 and got same results.
Here is log for batch size 5120:
It seems I’m still not able to reproduce your setup. Here are more details about my current setup:
It seems my OOM is higher than yours (0.10 vs. 0.02). Which CUDA version are you using? Any other ideas?