question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

How to reproduce the result of WMT14 en-de on transformer BASE model?

See original GitHub issue

Hi

I want to replicate the WMT14 en-de translation result on transformer BASE model of the paper “attention is all you need”. Following the last instructions here, I downloaded and preprocessed the data. Then I trained the model with this:

CUDA_VISIBLE_DEVICES=0,1,2,3  python train.py data-bin/wmt16_en_de_bpe32k \
        --arch transformer_wmt_en_de --share-all-embeddings \
          --optimizer adam --adam-betas '(0.9, 0.98)' --clip-norm 0.0 \
            --lr-scheduler inverse_sqrt --warmup-init-lr 1e-07 --warmup-updates 4000 \
              --lr 0.0005 --min-lr 1e-09 \
             --criterion label_smoothed_cross_entropy --label-smoothing 0.1 --weight-decay 0.0\
              --max-tokens  4096   --save-dir checkpoints/en-de\
              --update-freq 2 --no-progress-bar --log-format json --log-interval 50\
             --save-interval-updates  1000 --keep-interval-updates 20

I averaged last 5 checkpoints and generated the translation with this:

model=model.pt
subset="test"
  
   CUDA_VISIBLE_DEVICES=0 python generate.py data-bin/wmt16_en_de_bpe32k  \
         --path checkpoints/$model --gen-subset $subset\
           --beam 4 --batch-size 128 --remove-bpe  --lenpen 0.6

However, after about 120k updates, I got :
| Generate test with beam=4: BLEU4 = 26.38, 57.8/32.0/20.0/13.1 (BP=1.000, ratio=1.020, syslen=64352, reflen=63078)

After about 250k updates, I got: | Generate test with beam=4: BLEU4 = 26.39, 57.8/32.0/20.0/13.1 (BP=1.000, ratio=1.017, syslen=64123, reflen=63078)

Far away from the result in “attention is all you need”(27.3). Can you think of any reasons for this? Thanks a lot!

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Reactions:10
  • Comments:23 (7 by maintainers)

github_iconTop GitHub Comments

10reactions
myleottcommented, Nov 7, 2018

Great! The last step to reproduce results from Vaswani et al. is to split compound words. This step gives a moderate increase in BLEU but is somewhat hacky. In general it’s preferable to report detokenized BLEU via tools like sacrebleu, although detok. BLEU is usually lower than tokenized BLEU. See this paper: https://arxiv.org/abs/1804.08771

Here is the script: https://github.com/tensorflow/tensor2tensor/blob/master/tensor2tensor/utils/get_ende_bleu.sh The compound splitting is near the bottom of the script.

8reactions
gushu333commented, Nov 7, 2018

That’s so interesting! After using this script, I got: BLEU = 27.70, 58.9/33.4/21.2/14.1 (BP=1.000, ratio=1.015, hyp_len=65442, ref_len=64496) Meanwhile, I find that the BLEU score of the averaged model which has about 180k updates has already achieved: BLEU = 27.37, 58.6/33.0/21.0/13.8 (BP=1.000, ratio=1.016, hyp_len=65500, ref_len=64496) Thanks again for your help! 👍

Read more comments on GitHub >

github_iconTop Results From Across the Web

How to reproduce the result of WMT14 en-de on transformer ...
Hi I want to replicate the WMT14 en-de translation result on transformer BASE model of the paper "attention is all you need".
Read more >
arXiv:1908.05672v5 [cs.CL] 20 Jun 2022
Our proposed CTNMT consists of three techniques: a) asymptotic distillation to ensure that the NMT model can retain the previous pre-trained ...
Read more >
UNDERSTANDING KNOWLEDGE DISTILLATION IN NON ...
In our experiments, we first run beam search using the base Transformer model with a beam size of 5 then select the sentences...
Read more >
Towards Efficient Neural Machine Translation
standard transformer-base [178] model with a vocabulary of 40,000 tokens ... Neural Machine Translation (NMT) is an end-to-end structure which could ...
Read more >
Context-Aware Self-Attention Networks
mental results on WMT14 English⇒German and WMT17 ... To this end, we employ the internal ... TRANSFORMER-BASE with context-aware model achieves.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found