checkpoint averaging does not improve translation quality on wmt16_en_de
See original GitHub issueI downloaded the data set prepared by Google and ran the following command to preprocess the data set
$ TEXT=wmt16_en_de_bpe32k
$ mkdir $TEXT
$ tar -xzvf wmt16_en_de.tar.gz -C $TEXT
$ fairseq-preprocess --source-lang en --target-lang de \
--trainpref $TEXT/train.tok.clean.bpe.32000 \
--validpref $TEXT/newstest2013.tok.bpe.32000 \
--testpref $TEXT/newstest2014.tok.bpe.32000 \
--destdir data-bin/wmt16_en_de_bpe32k \
--nwordssrc 32768 --nwordstgt 32768 \
--joined-dictionary
I trained the model as follows:
CUDA_VISIBLE_DEVICES=2,3 python train.py data-bin/wmt16_en_de_bpe32k \
--arch transformer_wmt_en_de --share-all-embeddings \
--optimizer adam --adam-betas '(0.9, 0.98)' --clip-norm 0.0 \
--lr-scheduler inverse_sqrt --warmup-init-lr 1e-07 --warmup-updates 4000 \
--lr 0.0007 --min-lr 1e-09 \
--criterion label_smoothed_cross_entropy --label-smoothing 0.1 --weight-decay 0.0 \
--max-tokens 4096 --save-dir checkpoints/transformer_wmt16_en_de_bpe32k \
--update-freq 4 --no-progress-bar --log-format json --log-interval 50 \
--save-interval-updates 1000 --keep-interval-updates 20
After about 200k step, I averaged the model with
python scripts/average_checkpoints.py --inputs checkpoints/transformer_wmt16_en_de_bpe32k --num-epoch-checkpoints 10 --output averaged_model.pt
and tested the model with:
subset=test
model=averaged_model.pt
CUDA_VISIBLE_DEVICES=1 python generate.py data-bin/wmt16_en_de_bpe32k \
--path $model --gen-subset $subset\
--beam 4 --batch-size 128 --remove-bpe --lenpen 0.6
and got BLEU score as
| Translated 3003 sentences (77175 tokens) in 25.3s (118.82 sentences/s, 3053.64 tokens/s)
| Generate test with beam=4: BLEU4 = 19.18, 54.9/26.5/14.6/8.5 (BP=0.931, ratio=0.933, syslen=58865, reflen=63078)
, which is way worse than the original results.
The validation log seemed normal
{"epoch": 1, "valid_loss": "9.333", "valid_nll_loss": "8.389", "valid_ppl": "335.32", "valid_num_updates": "1000"}
{"epoch": 1, "valid_loss": "6.916", "valid_nll_loss": "5.576", "valid_ppl": "47.70", "valid_num_updates": "2000", "valid_best_loss": "6.91585"}
{"epoch": 1, "valid_loss": "5.726", "valid_nll_loss": "4.178", "valid_ppl": "18.10", "valid_num_updates": "3000", "valid_best_loss": "5.72631"}
{"epoch": 1, "valid_loss": "5.227", "valid_nll_loss": "3.626", "valid_ppl": "12.34", "valid_num_updates": "4000", "valid_best_loss": "5.227"}
{"epoch": 1, "valid_loss": "4.999", "valid_nll_loss": "3.378", "valid_ppl": "10.39", "valid_num_updates": "4751", "valid_best_loss": "4.9993"}
{"epoch": 2, "valid_loss": "4.956", "valid_nll_loss": "3.330", "valid_ppl": "10.06", "valid_num_updates": "5000", "valid_best_loss": "4.95558"}
{"epoch": 2, "valid_loss": "4.775", "valid_nll_loss": "3.136", "valid_ppl": "8.79", "valid_num_updates": "6000", "valid_best_loss": "4.77489"}
{"epoch": 2, "valid_loss": "4.674", "valid_nll_loss": "3.021", "valid_ppl": "8.12", "valid_num_updates": "7000", "valid_best_loss": "4.67431"}
{"epoch": 2, "valid_loss": "4.576", "valid_nll_loss": "2.926", "valid_ppl": "7.60", "valid_num_updates": "8000", "valid_best_loss": "4.5765"}
{"epoch": 2, "valid_loss": "4.525", "valid_nll_loss": "2.850", "valid_ppl": "7.21", "valid_num_updates": "9000", "valid_best_loss": "4.52456"}
{"epoch": 2, "valid_loss": "4.484", "valid_nll_loss": "2.824", "valid_ppl": "7.08", "valid_num_updates": "9502", "valid_best_loss": "4.48447"}
{"epoch": 3, "valid_loss": "4.450", "valid_nll_loss": "2.783", "valid_ppl": "6.88", "valid_num_updates": "10000", "valid_best_loss": "4.44979"}
{"epoch": 3, "valid_loss": "4.409", "valid_nll_loss": "2.742", "valid_ppl": "6.69", "valid_num_updates": "11000", "valid_best_loss": "4.40874"}
{"epoch": 3, "valid_loss": "4.376", "valid_nll_loss": "2.703", "valid_ppl": "6.51", "valid_num_updates": "12000", "valid_best_loss": "4.37565"}
{"epoch": 3, "valid_loss": "4.339", "valid_nll_loss": "2.666", "valid_ppl": "6.35", "valid_num_updates": "13000", "valid_best_loss": "4.33889"}
{"epoch": 3, "valid_loss": "4.320", "valid_nll_loss": "2.653", "valid_ppl": "6.29", "valid_num_updates": "14000", "valid_best_loss": "4.31983"}
{"epoch": 3, "valid_loss": "4.307", "valid_nll_loss": "2.638", "valid_ppl": "6.23", "valid_num_updates": "14253", "valid_best_loss": "4.30705"}
{"epoch": 4, "valid_loss": "4.304", "valid_nll_loss": "2.614", "valid_ppl": "6.12", "valid_num_updates": "15000", "valid_best_loss": "4.30354"}
{"epoch": 4, "valid_loss": "4.295", "valid_nll_loss": "2.616", "valid_ppl": "6.13", "valid_num_updates": "16000", "valid_best_loss": "4.29483"}
{"epoch": 4, "valid_loss": "4.255", "valid_nll_loss": "2.573", "valid_ppl": "5.95", "valid_num_updates": "17000", "valid_best_loss": "4.25462"}
{"epoch": 4, "valid_loss": "4.239", "valid_nll_loss": "2.553", "valid_ppl": "5.87", "valid_num_updates": "18000", "valid_best_loss": "4.23945"}
{"epoch": 4, "valid_loss": "4.235", "valid_nll_loss": "2.541", "valid_ppl": "5.82", "valid_num_updates": "19000", "valid_best_loss": "4.23489"}
{"epoch": 4, "valid_loss": "4.216", "valid_nll_loss": "2.537", "valid_ppl": "5.80", "valid_num_updates": "19004", "valid_best_loss": "4.21608"}
{"epoch": 5, "valid_loss": "4.209", "valid_nll_loss": "2.521", "valid_ppl": "5.74", "valid_num_updates": "20000", "valid_best_loss": "4.20865"}
{"epoch": 5, "valid_loss": "4.201", "valid_nll_loss": "2.516", "valid_ppl": "5.72", "valid_num_updates": "21000", "valid_best_loss": "4.20074"}
{"epoch": 5, "valid_loss": "4.196", "valid_nll_loss": "2.515", "valid_ppl": "5.72", "valid_num_updates": "22000", "valid_best_loss": "4.19649"}
{"epoch": 5, "valid_loss": "4.177", "valid_nll_loss": "2.495", "valid_ppl": "5.64", "valid_num_updates": "23000", "valid_best_loss": "4.17682"}
{"epoch": 5, "valid_loss": "4.170", "valid_nll_loss": "2.488", "valid_ppl": "5.61", "valid_num_updates": "23755", "valid_best_loss": "4.16978"}
{"epoch": 6, "valid_loss": "4.178", "valid_nll_loss": "2.483", "valid_ppl": "5.59", "valid_num_updates": "24000", "valid_best_loss": "4.16978"}
{"epoch": 6, "valid_loss": "4.160", "valid_nll_loss": "2.466", "valid_ppl": "5.52", "valid_num_updates": "25000", "valid_best_loss": "4.16007"}
{"epoch": 6, "valid_loss": "4.160", "valid_nll_loss": "2.470", "valid_ppl": "5.54", "valid_num_updates": "26000", "valid_best_loss": "4.16007"}
{"epoch": 6, "valid_loss": "4.147", "valid_nll_loss": "2.453", "valid_ppl": "5.48", "valid_num_updates": "27000", "valid_best_loss": "4.14741"}
{"epoch": 6, "valid_loss": "4.135", "valid_nll_loss": "2.443", "valid_ppl": "5.44", "valid_num_updates": "28000", "valid_best_loss": "4.13474"}
{"epoch": 6, "valid_loss": "4.128", "valid_nll_loss": "2.442", "valid_ppl": "5.43", "valid_num_updates": "28506", "valid_best_loss": "4.12798"}
{"epoch": 7, "valid_loss": "4.132", "valid_nll_loss": "2.441", "valid_ppl": "5.43", "valid_num_updates": "29000", "valid_best_loss": "4.12798"}
{"epoch": 7, "valid_loss": "4.116", "valid_nll_loss": "2.422", "valid_ppl": "5.36", "valid_num_updates": "30000", "valid_best_loss": "4.11599"}
{"epoch": 7, "valid_loss": "4.113", "valid_nll_loss": "2.420", "valid_ppl": "5.35", "valid_num_updates": "31000", "valid_best_loss": "4.11286"}
{"epoch": 7, "valid_loss": "4.112", "valid_nll_loss": "2.414", "valid_ppl": "5.33", "valid_num_updates": "32000", "valid_best_loss": "4.11226"}
{"epoch": 7, "valid_loss": "4.101", "valid_nll_loss": "2.414", "valid_ppl": "5.33", "valid_num_updates": "33000", "valid_best_loss": "4.10095"}
{"epoch": 7, "valid_loss": "4.099", "valid_nll_loss": "2.406", "valid_ppl": "5.30", "valid_num_updates": "33257", "valid_best_loss": "4.09882"}
{"epoch": 8, "valid_loss": "4.099", "valid_nll_loss": "2.404", "valid_ppl": "5.29", "valid_num_updates": "34000", "valid_best_loss": "4.09874"}
{"epoch": 8, "valid_loss": "4.101", "valid_nll_loss": "2.406", "valid_ppl": "5.30", "valid_num_updates": "35000", "valid_best_loss": "4.09874"}
{"epoch": 8, "valid_loss": "4.090", "valid_nll_loss": "2.397", "valid_ppl": "5.27", "valid_num_updates": "36000", "valid_best_loss": "4.08955"}
{"epoch": 8, "valid_loss": "4.091", "valid_nll_loss": "2.397", "valid_ppl": "5.27", "valid_num_updates": "37000", "valid_best_loss": "4.08955"}
{"epoch": 8, "valid_loss": "4.079", "valid_nll_loss": "2.384", "valid_ppl": "5.22", "valid_num_updates": "38000", "valid_best_loss": "4.07911"}
{"epoch": 8, "valid_loss": "4.087", "valid_nll_loss": "2.393", "valid_ppl": "5.25", "valid_num_updates": "38008", "valid_best_loss": "4.07911"}
{"epoch": 9, "valid_loss": "4.076", "valid_nll_loss": "2.385", "valid_ppl": "5.22", "valid_num_updates": "39000", "valid_best_loss": "4.07593"}
{"epoch": 9, "valid_loss": "4.076", "valid_nll_loss": "2.380", "valid_ppl": "5.20", "valid_num_updates": "40000", "valid_best_loss": "4.07563"}
{"epoch": 9, "valid_loss": "4.081", "valid_nll_loss": "2.379", "valid_ppl": "5.20", "valid_num_updates": "41000", "valid_best_loss": "4.07563"}
{"epoch": 9, "valid_loss": "4.061", "valid_nll_loss": "2.366", "valid_ppl": "5.15", "valid_num_updates": "42000", "valid_best_loss": "4.06122"}
{"epoch": 9, "valid_loss": "4.072", "valid_nll_loss": "2.374", "valid_ppl": "5.19", "valid_num_updates": "42759", "valid_best_loss": "4.06122"}
{"epoch": 10, "valid_loss": "4.062", "valid_nll_loss": "2.364", "valid_ppl": "5.15", "valid_num_updates": "43000", "valid_best_loss": "4.06122"}
{"epoch": 10, "valid_loss": "4.063", "valid_nll_loss": "2.360", "valid_ppl": "5.13", "valid_num_updates": "44000", "valid_best_loss": "4.06122"}
{"epoch": 10, "valid_loss": "4.058", "valid_nll_loss": "2.360", "valid_ppl": "5.13", "valid_num_updates": "45000", "valid_best_loss": "4.05814"}
{"epoch": 10, "valid_loss": "4.060", "valid_nll_loss": "2.357", "valid_ppl": "5.12", "valid_num_updates": "46000", "valid_best_loss": "4.05814"}
{"epoch": 10, "valid_loss": "4.047", "valid_nll_loss": "2.349", "valid_ppl": "5.10", "valid_num_updates": "47000", "valid_best_loss": "4.04742"}
{"epoch": 10, "valid_loss": "4.049", "valid_nll_loss": "2.351", "valid_ppl": "5.10", "valid_num_updates": "47510", "valid_best_loss": "4.04742"}
{"epoch": 11, "valid_loss": "4.049", "valid_nll_loss": "2.346", "valid_ppl": "5.09", "valid_num_updates": "48000", "valid_best_loss": "4.04742"}
{"epoch": 11, "valid_loss": "4.052", "valid_nll_loss": "2.355", "valid_ppl": "5.12", "valid_num_updates": "49000", "valid_best_loss": "4.04742"}
{"epoch": 11, "valid_loss": "4.063", "valid_nll_loss": "2.357", "valid_ppl": "5.12", "valid_num_updates": "50000", "valid_best_loss": "4.04742"}
{"epoch": 11, "valid_loss": "4.035", "valid_nll_loss": "2.337", "valid_ppl": "5.05", "valid_num_updates": "51000", "valid_best_loss": "4.03477"}
{"epoch": 11, "valid_loss": "4.036", "valid_nll_loss": "2.336", "valid_ppl": "5.05", "valid_num_updates": "52000", "valid_best_loss": "4.03477"}
{"epoch": 11, "valid_loss": "4.042", "valid_nll_loss": "2.341", "valid_ppl": "5.07", "valid_num_updates": "52261", "valid_best_loss": "4.03477"}
{"epoch": 12, "valid_loss": "4.047", "valid_nll_loss": "2.349", "valid_ppl": "5.09", "valid_num_updates": "53000", "valid_best_loss": "4.03477"}
{"epoch": 12, "valid_loss": "4.040", "valid_nll_loss": "2.342", "valid_ppl": "5.07", "valid_num_updates": "54000", "valid_best_loss": "4.03477"}
{"epoch": 12, "valid_loss": "4.035", "valid_nll_loss": "2.327", "valid_ppl": "5.02", "valid_num_updates": "55000", "valid_best_loss": "4.03459"}
{"epoch": 12, "valid_loss": "4.025", "valid_nll_loss": "2.326", "valid_ppl": "5.01", "valid_num_updates": "56000", "valid_best_loss": "4.02505"}
{"epoch": 12, "valid_loss": "4.024", "valid_nll_loss": "2.321", "valid_ppl": "5.00", "valid_num_updates": "57000", "valid_best_loss": "4.02434"}
{"epoch": 12, "valid_loss": "4.034", "valid_nll_loss": "2.330", "valid_ppl": "5.03", "valid_num_updates": "57012", "valid_best_loss": "4.02434"}
{"epoch": 13, "valid_loss": "4.028", "valid_nll_loss": "2.325", "valid_ppl": "5.01", "valid_num_updates": "58000", "valid_best_loss": "4.02434"}
{"epoch": 13, "valid_loss": "4.024", "valid_nll_loss": "2.323", "valid_ppl": "5.00", "valid_num_updates": "59000", "valid_best_loss": "4.02361"}
{"epoch": 13, "valid_loss": "4.037", "valid_nll_loss": "2.335", "valid_ppl": "5.05", "valid_num_updates": "60000", "valid_best_loss": "4.02361"}
{"epoch": 13, "valid_loss": "4.017", "valid_nll_loss": "2.316", "valid_ppl": "4.98", "valid_num_updates": "61000", "valid_best_loss": "4.01695"}
{"epoch": 13, "valid_loss": "4.021", "valid_nll_loss": "2.315", "valid_ppl": "4.98", "valid_num_updates": "61763", "valid_best_loss": "4.01695"}
{"epoch": 14, "valid_loss": "4.026", "valid_nll_loss": "2.323", "valid_ppl": "5.00", "valid_num_updates": "62000", "valid_best_loss": "4.01695"}
{"epoch": 14, "valid_loss": "4.017", "valid_nll_loss": "2.317", "valid_ppl": "4.98", "valid_num_updates": "63000", "valid_best_loss": "4.01695"}
{"epoch": 14, "valid_loss": "4.014", "valid_nll_loss": "2.314", "valid_ppl": "4.97", "valid_num_updates": "64000", "valid_best_loss": "4.01389"}
{"epoch": 14, "valid_loss": "4.020", "valid_nll_loss": "2.318", "valid_ppl": "4.99", "valid_num_updates": "65000", "valid_best_loss": "4.01389"}
{"epoch": 14, "valid_loss": "4.013", "valid_nll_loss": "2.312", "valid_ppl": "4.96", "valid_num_updates": "66000", "valid_best_loss": "4.01349"}
{"epoch": 14, "valid_loss": "4.017", "valid_nll_loss": "2.313", "valid_ppl": "4.97", "valid_num_updates": "66514", "valid_best_loss": "4.01349"}
{"epoch": 15, "valid_loss": "4.015", "valid_nll_loss": "2.313", "valid_ppl": "4.97", "valid_num_updates": "67000", "valid_best_loss": "4.01349"}
{"epoch": 15, "valid_loss": "4.012", "valid_nll_loss": "2.309", "valid_ppl": "4.96", "valid_num_updates": "68000", "valid_best_loss": "4.01173"}
{"epoch": 15, "valid_loss": "4.016", "valid_nll_loss": "2.313", "valid_ppl": "4.97", "valid_num_updates": "69000", "valid_best_loss": "4.01173"}
{"epoch": 15, "valid_loss": "4.005", "valid_nll_loss": "2.306", "valid_ppl": "4.95", "valid_num_updates": "70000", "valid_best_loss": "4.00522"}
{"epoch": 15, "valid_loss": "4.003", "valid_nll_loss": "2.307", "valid_ppl": "4.95", "valid_num_updates": "71000", "valid_best_loss": "4.00308"}
{"epoch": 15, "valid_loss": "4.008", "valid_nll_loss": "2.309", "valid_ppl": "4.95", "valid_num_updates": "71265", "valid_best_loss": "4.00308"}
{"epoch": 16, "valid_loss": "4.004", "valid_nll_loss": "2.302", "valid_ppl": "4.93", "valid_num_updates": "72000", "valid_best_loss": "4.00308"}
{"epoch": 16, "valid_loss": "4.014", "valid_nll_loss": "2.308", "valid_ppl": "4.95", "valid_num_updates": "73000", "valid_best_loss": "4.00308"}
{"epoch": 16, "valid_loss": "4.010", "valid_nll_loss": "2.308", "valid_ppl": "4.95", "valid_num_updates": "74000", "valid_best_loss": "4.00308"}
{"epoch": 16, "valid_loss": "4.009", "valid_nll_loss": "2.307", "valid_ppl": "4.95", "valid_num_updates": "75000", "valid_best_loss": "4.00308"}
{"epoch": 16, "valid_loss": "3.996", "valid_nll_loss": "2.297", "valid_ppl": "4.91", "valid_num_updates": "76000", "valid_best_loss": "3.99577"}
{"epoch": 16, "valid_loss": "4.002", "valid_nll_loss": "2.302", "valid_ppl": "4.93", "valid_num_updates": "76016", "valid_best_loss": "3.99577"}
{"epoch": 17, "valid_loss": "4.001", "valid_nll_loss": "2.298", "valid_ppl": "4.92", "valid_num_updates": "77000", "valid_best_loss": "3.99577"}
{"epoch": 17, "valid_loss": "4.004", "valid_nll_loss": "2.302", "valid_ppl": "4.93", "valid_num_updates": "78000", "valid_best_loss": "3.99577"}
{"epoch": 17, "valid_loss": "3.998", "valid_nll_loss": "2.296", "valid_ppl": "4.91", "valid_num_updates": "79000", "valid_best_loss": "3.99577"}
{"epoch": 17, "valid_loss": "3.999", "valid_nll_loss": "2.295", "valid_ppl": "4.91", "valid_num_updates": "80000", "valid_best_loss": "3.99577"}
{"epoch": 17, "valid_loss": "3.999", "valid_nll_loss": "2.293", "valid_ppl": "4.90", "valid_num_updates": "80767", "valid_best_loss": "3.99577"}
{"epoch": 18, "valid_loss": "4.004", "valid_nll_loss": "2.296", "valid_ppl": "4.91", "valid_num_updates": "81000", "valid_best_loss": "3.99577"}
{"epoch": 18, "valid_loss": "4.008", "valid_nll_loss": "2.299", "valid_ppl": "4.92", "valid_num_updates": "82000", "valid_best_loss": "3.99577"}
{"epoch": 18, "valid_loss": "4.001", "valid_nll_loss": "2.295", "valid_ppl": "4.91", "valid_num_updates": "83000", "valid_best_loss": "3.99577"}
{"epoch": 18, "valid_loss": "3.992", "valid_nll_loss": "2.287", "valid_ppl": "4.88", "valid_num_updates": "84000", "valid_best_loss": "3.99211"}
{"epoch": 18, "valid_loss": "3.994", "valid_nll_loss": "2.289", "valid_ppl": "4.89", "valid_num_updates": "85000", "valid_best_loss": "3.99211"}
{"epoch": 18, "valid_loss": "3.999", "valid_nll_loss": "2.297", "valid_ppl": "4.91", "valid_num_updates": "85518", "valid_best_loss": "3.99211"}
{"epoch": 19, "valid_loss": "3.991", "valid_nll_loss": "2.289", "valid_ppl": "4.89", "valid_num_updates": "86000", "valid_best_loss": "3.99073"}
{"epoch": 19, "valid_loss": "3.996", "valid_nll_loss": "2.290", "valid_ppl": "4.89", "valid_num_updates": "87000", "valid_best_loss": "3.99073"}
{"epoch": 19, "valid_loss": "3.988", "valid_nll_loss": "2.285", "valid_ppl": "4.87", "valid_num_updates": "88000", "valid_best_loss": "3.9883"}
{"epoch": 19, "valid_loss": "3.993", "valid_nll_loss": "2.289", "valid_ppl": "4.89", "valid_num_updates": "89000", "valid_best_loss": "3.9883"}
{"epoch": 19, "valid_loss": "3.986", "valid_nll_loss": "2.283", "valid_ppl": "4.87", "valid_num_updates": "90000", "valid_best_loss": "3.98635"}
{"epoch": 19, "valid_loss": "3.984", "valid_nll_loss": "2.279", "valid_ppl": "4.85", "valid_num_updates": "90269", "valid_best_loss": "3.98439"}
{"epoch": 20, "valid_loss": "3.991", "valid_nll_loss": "2.282", "valid_ppl": "4.86", "valid_num_updates": "91000", "valid_best_loss": "3.98439"}
{"epoch": 20, "valid_loss": "3.992", "valid_nll_loss": "2.289", "valid_ppl": "4.89", "valid_num_updates": "92000", "valid_best_loss": "3.98439"}
{"epoch": 20, "valid_loss": "3.984", "valid_nll_loss": "2.282", "valid_ppl": "4.86", "valid_num_updates": "93000", "valid_best_loss": "3.98412"}
{"epoch": 20, "valid_loss": "3.988", "valid_nll_loss": "2.281", "valid_ppl": "4.86", "valid_num_updates": "94000", "valid_best_loss": "3.98412"}
{"epoch": 20, "valid_loss": "3.978", "valid_nll_loss": "2.276", "valid_ppl": "4.84", "valid_num_updates": "95000", "valid_best_loss": "3.97801"}
{"epoch": 20, "valid_loss": "3.989", "valid_nll_loss": "2.284", "valid_ppl": "4.87", "valid_num_updates": "95020", "valid_best_loss": "3.97801"}
{"epoch": 21, "valid_loss": "3.988", "valid_nll_loss": "2.279", "valid_ppl": "4.85", "valid_num_updates": "96000", "valid_best_loss": "3.97801"}
{"epoch": 21, "valid_loss": "3.987", "valid_nll_loss": "2.277", "valid_ppl": "4.85", "valid_num_updates": "97000", "valid_best_loss": "3.97801"}
{"epoch": 21, "valid_loss": "3.983", "valid_nll_loss": "2.275", "valid_ppl": "4.84", "valid_num_updates": "98000", "valid_best_loss": "3.97801"}
{"epoch": 21, "valid_loss": "3.983", "valid_nll_loss": "2.279", "valid_ppl": "4.85", "valid_num_updates": "99000", "valid_best_loss": "3.97801"}
{"epoch": 21, "valid_loss": "3.971", "valid_nll_loss": "2.267", "valid_ppl": "4.81", "valid_num_updates": "99771", "valid_best_loss": "3.97125"}
{"epoch": 22, "valid_loss": "3.982", "valid_nll_loss": "2.275", "valid_ppl": "4.84", "valid_num_updates": "100000", "valid_best_loss": "3.97125"}
{"epoch": 22, "valid_loss": "3.978", "valid_nll_loss": "2.273", "valid_ppl": "4.83", "valid_num_updates": "101000", "valid_best_loss": "3.97125"}
{"epoch": 22, "valid_loss": "3.981", "valid_nll_loss": "2.272", "valid_ppl": "4.83", "valid_num_updates": "102000", "valid_best_loss": "3.97125"}
{"epoch": 22, "valid_loss": "3.975", "valid_nll_loss": "2.271", "valid_ppl": "4.83", "valid_num_updates": "103000", "valid_best_loss": "3.97125"}
{"epoch": 22, "valid_loss": "3.971", "valid_nll_loss": "2.267", "valid_ppl": "4.81", "valid_num_updates": "104000", "valid_best_loss": "3.97121"}
{"epoch": 22, "valid_loss": "3.980", "valid_nll_loss": "2.275", "valid_ppl": "4.84", "valid_num_updates": "104522", "valid_best_loss": "3.97121"}
{"epoch": 23, "valid_loss": "3.984", "valid_nll_loss": "2.277", "valid_ppl": "4.85", "valid_num_updates": "105000", "valid_best_loss": "3.97121"}
{"epoch": 23, "valid_loss": "3.979", "valid_nll_loss": "2.275", "valid_ppl": "4.84", "valid_num_updates": "106000", "valid_best_loss": "3.97121"}
{"epoch": 23, "valid_loss": "3.978", "valid_nll_loss": "2.271", "valid_ppl": "4.83", "valid_num_updates": "107000", "valid_best_loss": "3.97121"}
{"epoch": 23, "valid_loss": "3.977", "valid_nll_loss": "2.272", "valid_ppl": "4.83", "valid_num_updates": "108000", "valid_best_loss": "3.97121"}
{"epoch": 23, "valid_loss": "3.972", "valid_nll_loss": "2.265", "valid_ppl": "4.81", "valid_num_updates": "109000", "valid_best_loss": "3.97121"}
{"epoch": 23, "valid_loss": "3.975", "valid_nll_loss": "2.272", "valid_ppl": "4.83", "valid_num_updates": "109273", "valid_best_loss": "3.97121"}
{"epoch": 24, "valid_loss": "3.975", "valid_nll_loss": "2.265", "valid_ppl": "4.81", "valid_num_updates": "110000", "valid_best_loss": "3.97121"}
{"epoch": 24, "valid_loss": "3.975", "valid_nll_loss": "2.270", "valid_ppl": "4.82", "valid_num_updates": "111000", "valid_best_loss": "3.97121"}
{"epoch": 24, "valid_loss": "3.974", "valid_nll_loss": "2.262", "valid_ppl": "4.80", "valid_num_updates": "112000", "valid_best_loss": "3.97121"}
{"epoch": 24, "valid_loss": "3.974", "valid_nll_loss": "2.269", "valid_ppl": "4.82", "valid_num_updates": "113000", "valid_best_loss": "3.97121"}
{"epoch": 24, "valid_loss": "3.969", "valid_nll_loss": "2.265", "valid_ppl": "4.81", "valid_num_updates": "114000", "valid_best_loss": "3.96946"}
{"epoch": 24, "valid_loss": "3.971", "valid_nll_loss": "2.271", "valid_ppl": "4.83", "valid_num_updates": "114024", "valid_best_loss": "3.96946"}
{"epoch": 25, "valid_loss": "3.985", "valid_nll_loss": "2.284", "valid_ppl": "4.87", "valid_num_updates": "115000", "valid_best_loss": "3.96946"}
{"epoch": 25, "valid_loss": "3.981", "valid_nll_loss": "2.270", "valid_ppl": "4.82", "valid_num_updates": "116000", "valid_best_loss": "3.96946"}
{"epoch": 25, "valid_loss": "3.974", "valid_nll_loss": "2.265", "valid_ppl": "4.81", "valid_num_updates": "117000", "valid_best_loss": "3.96946"}
{"epoch": 25, "valid_loss": "3.972", "valid_nll_loss": "2.268", "valid_ppl": "4.82", "valid_num_updates": "118000", "valid_best_loss": "3.96946"}
{"epoch": 25, "valid_loss": "3.968", "valid_nll_loss": "2.267", "valid_ppl": "4.81", "valid_num_updates": "118775", "valid_best_loss": "3.96751"}
{"epoch": 26, "valid_loss": "3.977", "valid_nll_loss": "2.266", "valid_ppl": "4.81", "valid_num_updates": "119000", "valid_best_loss": "3.96751"}
{"epoch": 26, "valid_loss": "3.977", "valid_nll_loss": "2.268", "valid_ppl": "4.82", "valid_num_updates": "120000", "valid_best_loss": "3.96751"}
{"epoch": 26, "valid_loss": "3.972", "valid_nll_loss": "2.264", "valid_ppl": "4.80", "valid_num_updates": "121000", "valid_best_loss": "3.96751"}
{"epoch": 26, "valid_loss": "3.974", "valid_nll_loss": "2.265", "valid_ppl": "4.81", "valid_num_updates": "122000", "valid_best_loss": "3.96751"}
{"epoch": 26, "valid_loss": "3.965", "valid_nll_loss": "2.262", "valid_ppl": "4.80", "valid_num_updates": "123000", "valid_best_loss": "3.96485"}
{"epoch": 26, "valid_loss": "3.967", "valid_nll_loss": "2.262", "valid_ppl": "4.80", "valid_num_updates": "123526", "valid_best_loss": "3.96485"}
{"epoch": 27, "valid_loss": "3.978", "valid_nll_loss": "2.270", "valid_ppl": "4.82", "valid_num_updates": "124000", "valid_best_loss": "3.96485"}
{"epoch": 27, "valid_loss": "3.972", "valid_nll_loss": "2.266", "valid_ppl": "4.81", "valid_num_updates": "125000", "valid_best_loss": "3.96485"}
{"epoch": 27, "valid_loss": "3.968", "valid_nll_loss": "2.265", "valid_ppl": "4.81", "valid_num_updates": "126000", "valid_best_loss": "3.96485"}
{"epoch": 27, "valid_loss": "3.963", "valid_nll_loss": "2.256", "valid_ppl": "4.78", "valid_num_updates": "127000", "valid_best_loss": "3.96318"}
{"epoch": 27, "valid_loss": "3.961", "valid_nll_loss": "2.258", "valid_ppl": "4.78", "valid_num_updates": "128000", "valid_best_loss": "3.96079"}
{"epoch": 27, "valid_loss": "3.962", "valid_nll_loss": "2.252", "valid_ppl": "4.76", "valid_num_updates": "128277", "valid_best_loss": "3.96079"}
{"epoch": 28, "valid_loss": "3.969", "valid_nll_loss": "2.260", "valid_ppl": "4.79", "valid_num_updates": "129000", "valid_best_loss": "3.96079"}
{"epoch": 28, "valid_loss": "3.968", "valid_nll_loss": "2.259", "valid_ppl": "4.79", "valid_num_updates": "130000", "valid_best_loss": "3.96079"}
{"epoch": 28, "valid_loss": "3.967", "valid_nll_loss": "2.262", "valid_ppl": "4.80", "valid_num_updates": "131000", "valid_best_loss": "3.96079"}
{"epoch": 28, "valid_loss": "3.960", "valid_nll_loss": "2.254", "valid_ppl": "4.77", "valid_num_updates": "132000", "valid_best_loss": "3.95982"}
{"epoch": 28, "valid_loss": "3.964", "valid_nll_loss": "2.257", "valid_ppl": "4.78", "valid_num_updates": "133000", "valid_best_loss": "3.95982"}
{"epoch": 28, "valid_loss": "3.957", "valid_nll_loss": "2.253", "valid_ppl": "4.77", "valid_num_updates": "133028", "valid_best_loss": "3.95678"}
{"epoch": 29, "valid_loss": "3.967", "valid_nll_loss": "2.258", "valid_ppl": "4.78", "valid_num_updates": "134000", "valid_best_loss": "3.95678"}
{"epoch": 29, "valid_loss": "3.972", "valid_nll_loss": "2.264", "valid_ppl": "4.80", "valid_num_updates": "135000", "valid_best_loss": "3.95678"}
{"epoch": 29, "valid_loss": "3.962", "valid_nll_loss": "2.256", "valid_ppl": "4.78", "valid_num_updates": "136000", "valid_best_loss": "3.95678"}
{"epoch": 29, "valid_loss": "3.962", "valid_nll_loss": "2.254", "valid_ppl": "4.77", "valid_num_updates": "137000", "valid_best_loss": "3.95678"}
{"epoch": 29, "valid_loss": "3.960", "valid_nll_loss": "2.255", "valid_ppl": "4.77", "valid_num_updates": "137779", "valid_best_loss": "3.95678"}
{"epoch": 30, "valid_loss": "3.960", "valid_nll_loss": "2.251", "valid_ppl": "4.76", "valid_num_updates": "138000", "valid_best_loss": "3.95678"}
{"epoch": 30, "valid_loss": "3.969", "valid_nll_loss": "2.258", "valid_ppl": "4.78", "valid_num_updates": "139000", "valid_best_loss": "3.95678"}
{"epoch": 30, "valid_loss": "3.966", "valid_nll_loss": "2.259", "valid_ppl": "4.79", "valid_num_updates": "140000", "valid_best_loss": "3.95678"}
{"epoch": 30, "valid_loss": "3.972", "valid_nll_loss": "2.262", "valid_ppl": "4.80", "valid_num_updates": "141000", "valid_best_loss": "3.95678"}
{"epoch": 30, "valid_loss": "3.963", "valid_nll_loss": "2.255", "valid_ppl": "4.77", "valid_num_updates": "142000", "valid_best_loss": "3.95678"}
{"epoch": 30, "valid_loss": "3.953", "valid_nll_loss": "2.246", "valid_ppl": "4.74", "valid_num_updates": "142530", "valid_best_loss": "3.95294"}
{"epoch": 31, "valid_loss": "3.964", "valid_nll_loss": "2.258", "valid_ppl": "4.78", "valid_num_updates": "143000", "valid_best_loss": "3.95294"}
{"epoch": 31, "valid_loss": "3.969", "valid_nll_loss": "2.259", "valid_ppl": "4.79", "valid_num_updates": "144000", "valid_best_loss": "3.95294"}
{"epoch": 31, "valid_loss": "3.962", "valid_nll_loss": "2.252", "valid_ppl": "4.76", "valid_num_updates": "145000", "valid_best_loss": "3.95294"}
{"epoch": 31, "valid_loss": "3.966", "valid_nll_loss": "2.257", "valid_ppl": "4.78", "valid_num_updates": "146000", "valid_best_loss": "3.95294"}
{"epoch": 31, "valid_loss": "3.958", "valid_nll_loss": "2.254", "valid_ppl": "4.77", "valid_num_updates": "147000", "valid_best_loss": "3.95294"}
{"epoch": 31, "valid_loss": "3.964", "valid_nll_loss": "2.257", "valid_ppl": "4.78", "valid_num_updates": "147281", "valid_best_loss": "3.95294"}
{"epoch": 32, "valid_loss": "3.963", "valid_nll_loss": "2.255", "valid_ppl": "4.77", "valid_num_updates": "148000", "valid_best_loss": "3.95294"}
{"epoch": 32, "valid_loss": "3.967", "valid_nll_loss": "2.260", "valid_ppl": "4.79", "valid_num_updates": "149000", "valid_best_loss": "3.95294"}
{"epoch": 32, "valid_loss": "3.962", "valid_nll_loss": "2.253", "valid_ppl": "4.77", "valid_num_updates": "150000", "valid_best_loss": "3.95294"}
{"epoch": 32, "valid_loss": "3.961", "valid_nll_loss": "2.252", "valid_ppl": "4.76", "valid_num_updates": "151000", "valid_best_loss": "3.95294"}
{"epoch": 32, "valid_loss": "3.958", "valid_nll_loss": "2.248", "valid_ppl": "4.75", "valid_num_updates": "152000", "valid_best_loss": "3.95294"}
{"epoch": 32, "valid_loss": "3.957", "valid_nll_loss": "2.250", "valid_ppl": "4.76", "valid_num_updates": "152032", "valid_best_loss": "3.95294"}
{"epoch": 33, "valid_loss": "3.966", "valid_nll_loss": "2.261", "valid_ppl": "4.79", "valid_num_updates": "153000", "valid_best_loss": "3.95294"}
{"epoch": 33, "valid_loss": "3.965", "valid_nll_loss": "2.254", "valid_ppl": "4.77", "valid_num_updates": "154000", "valid_best_loss": "3.95294"}
{"epoch": 33, "valid_loss": "3.962", "valid_nll_loss": "2.252", "valid_ppl": "4.76", "valid_num_updates": "155000", "valid_best_loss": "3.95294"}
{"epoch": 33, "valid_loss": "3.955", "valid_nll_loss": "2.250", "valid_ppl": "4.76", "valid_num_updates": "156000", "valid_best_loss": "3.95294"}
{"epoch": 33, "valid_loss": "3.959", "valid_nll_loss": "2.252", "valid_ppl": "4.76", "valid_num_updates": "156783", "valid_best_loss": "3.95294"}
{"epoch": 34, "valid_loss": "3.964", "valid_nll_loss": "2.252", "valid_ppl": "4.76", "valid_num_updates": "157000", "valid_best_loss": "3.95294"}
{"epoch": 34, "valid_loss": "3.959", "valid_nll_loss": "2.252", "valid_ppl": "4.76", "valid_num_updates": "158000", "valid_best_loss": "3.95294"}
{"epoch": 34, "valid_loss": "3.959", "valid_nll_loss": "2.252", "valid_ppl": "4.76", "valid_num_updates": "159000", "valid_best_loss": "3.95294"}
{"epoch": 34, "valid_loss": "3.957", "valid_nll_loss": "2.251", "valid_ppl": "4.76", "valid_num_updates": "160000", "valid_best_loss": "3.95294"}
{"epoch": 34, "valid_loss": "3.952", "valid_nll_loss": "2.248", "valid_ppl": "4.75", "valid_num_updates": "161000", "valid_best_loss": "3.95165"}
{"epoch": 34, "valid_loss": "3.961", "valid_nll_loss": "2.249", "valid_ppl": "4.76", "valid_num_updates": "161534", "valid_best_loss": "3.95165"}
{"epoch": 35, "valid_loss": "3.957", "valid_nll_loss": "2.251", "valid_ppl": "4.76", "valid_num_updates": "162000", "valid_best_loss": "3.95165"}
{"epoch": 35, "valid_loss": "3.965", "valid_nll_loss": "2.256", "valid_ppl": "4.78", "valid_num_updates": "163000", "valid_best_loss": "3.95165"}
{"epoch": 35, "valid_loss": "3.960", "valid_nll_loss": "2.250", "valid_ppl": "4.76", "valid_num_updates": "164000", "valid_best_loss": "3.95165"}
{"epoch": 35, "valid_loss": "3.958", "valid_nll_loss": "2.249", "valid_ppl": "4.75", "valid_num_updates": "165000", "valid_best_loss": "3.95165"}
{"epoch": 35, "valid_loss": "3.958", "valid_nll_loss": "2.247", "valid_ppl": "4.75", "valid_num_updates": "166000", "valid_best_loss": "3.95165"}
{"epoch": 35, "valid_loss": "3.965", "valid_nll_loss": "2.257", "valid_ppl": "4.78", "valid_num_updates": "166285", "valid_best_loss": "3.95165"}
{"epoch": 36, "valid_loss": "3.955", "valid_nll_loss": "2.248", "valid_ppl": "4.75", "valid_num_updates": "167000", "valid_best_loss": "3.95165"}
{"epoch": 36, "valid_loss": "3.965", "valid_nll_loss": "2.257", "valid_ppl": "4.78", "valid_num_updates": "168000", "valid_best_loss": "3.95165"}
{"epoch": 36, "valid_loss": "3.958", "valid_nll_loss": "2.250", "valid_ppl": "4.76", "valid_num_updates": "169000", "valid_best_loss": "3.95165"}
{"epoch": 36, "valid_loss": "3.951", "valid_nll_loss": "2.244", "valid_ppl": "4.74", "valid_num_updates": "170000", "valid_best_loss": "3.95085"}
{"epoch": 36, "valid_loss": "3.961", "valid_nll_loss": "2.253", "valid_ppl": "4.77", "valid_num_updates": "171000", "valid_best_loss": "3.95085"}
{"epoch": 36, "valid_loss": "3.958", "valid_nll_loss": "2.248", "valid_ppl": "4.75", "valid_num_updates": "171036", "valid_best_loss": "3.95085"}
{"epoch": 37, "valid_loss": "3.956", "valid_nll_loss": "2.247", "valid_ppl": "4.75", "valid_num_updates": "172000", "valid_best_loss": "3.95085"}
{"epoch": 37, "valid_loss": "3.959", "valid_nll_loss": "2.251", "valid_ppl": "4.76", "valid_num_updates": "173000", "valid_best_loss": "3.95085"}
{"epoch": 37, "valid_loss": "3.959", "valid_nll_loss": "2.251", "valid_ppl": "4.76", "valid_num_updates": "174000", "valid_best_loss": "3.95085"}
{"epoch": 37, "valid_loss": "3.956", "valid_nll_loss": "2.249", "valid_ppl": "4.75", "valid_num_updates": "175000", "valid_best_loss": "3.95085"}
{"epoch": 37, "valid_loss": "3.954", "valid_nll_loss": "2.247", "valid_ppl": "4.75", "valid_num_updates": "175787", "valid_best_loss": "3.95085"}
{"epoch": 38, "valid_loss": "3.965", "valid_nll_loss": "2.254", "valid_ppl": "4.77", "valid_num_updates": "176000", "valid_best_loss": "3.95085"}
{"epoch": 38, "valid_loss": "3.960", "valid_nll_loss": "2.254", "valid_ppl": "4.77", "valid_num_updates": "177000", "valid_best_loss": "3.95085"}
{"epoch": 38, "valid_loss": "3.957", "valid_nll_loss": "2.251", "valid_ppl": "4.76", "valid_num_updates": "178000", "valid_best_loss": "3.95085"}
{"epoch": 38, "valid_loss": "3.954", "valid_nll_loss": "2.245", "valid_ppl": "4.74", "valid_num_updates": "179000", "valid_best_loss": "3.95085"}
{"epoch": 38, "valid_loss": "3.948", "valid_nll_loss": "2.242", "valid_ppl": "4.73", "valid_num_updates": "180000", "valid_best_loss": "3.94845"}
{"epoch": 38, "valid_loss": "3.954", "valid_nll_loss": "2.249", "valid_ppl": "4.75", "valid_num_updates": "180538", "valid_best_loss": "3.94845"}
{"epoch": 39, "valid_loss": "3.969", "valid_nll_loss": "2.256", "valid_ppl": "4.78", "valid_num_updates": "181000", "valid_best_loss": "3.94845"}
{"epoch": 39, "valid_loss": "3.963", "valid_nll_loss": "2.255", "valid_ppl": "4.77", "valid_num_updates": "182000", "valid_best_loss": "3.94845"}
{"epoch": 39, "valid_loss": "3.958", "valid_nll_loss": "2.247", "valid_ppl": "4.75", "valid_num_updates": "183000", "valid_best_loss": "3.94845"}
{"epoch": 39, "valid_loss": "3.948", "valid_nll_loss": "2.241", "valid_ppl": "4.73", "valid_num_updates": "184000", "valid_best_loss": "3.94835"}
{"epoch": 39, "valid_loss": "3.949", "valid_nll_loss": "2.242", "valid_ppl": "4.73", "valid_num_updates": "185000", "valid_best_loss": "3.94835"}
{"epoch": 39, "valid_loss": "3.950", "valid_nll_loss": "2.242", "valid_ppl": "4.73", "valid_num_updates": "185289", "valid_best_loss": "3.94835"}
{"epoch": 40, "valid_loss": "3.954", "valid_nll_loss": "2.243", "valid_ppl": "4.73", "valid_num_updates": "186000", "valid_best_loss": "3.94835"}
{"epoch": 40, "valid_loss": "3.956", "valid_nll_loss": "2.244", "valid_ppl": "4.74", "valid_num_updates": "187000", "valid_best_loss": "3.94835"}
{"epoch": 40, "valid_loss": "3.956", "valid_nll_loss": "2.251", "valid_ppl": "4.76", "valid_num_updates": "188000", "valid_best_loss": "3.94835"}
{"epoch": 40, "valid_loss": "3.956", "valid_nll_loss": "2.247", "valid_ppl": "4.75", "valid_num_updates": "189000", "valid_best_loss": "3.94835"}
{"epoch": 40, "valid_loss": "3.951", "valid_nll_loss": "2.246", "valid_ppl": "4.74", "valid_num_updates": "190000", "valid_best_loss": "3.94835"}
{"epoch": 40, "valid_loss": "3.945", "valid_nll_loss": "2.238", "valid_ppl": "4.72", "valid_num_updates": "190040", "valid_best_loss": "3.94524"}
{"epoch": 41, "valid_loss": "3.961", "valid_nll_loss": "2.249", "valid_ppl": "4.75", "valid_num_updates": "191000", "valid_best_loss": "3.94524"}
{"epoch": 41, "valid_loss": "3.956", "valid_nll_loss": "2.247", "valid_ppl": "4.75", "valid_num_updates": "192000", "valid_best_loss": "3.94524"}
{"epoch": 41, "valid_loss": "3.947", "valid_nll_loss": "2.241", "valid_ppl": "4.73", "valid_num_updates": "193000", "valid_best_loss": "3.94524"}
{"epoch": 41, "valid_loss": "3.949", "valid_nll_loss": "2.242", "valid_ppl": "4.73", "valid_num_updates": "194000", "valid_best_loss": "3.94524"}
{"epoch": 41, "valid_loss": "3.949", "valid_nll_loss": "2.242", "valid_ppl": "4.73", "valid_num_updates": "194791", "valid_best_loss": "3.94524"}
{"epoch": 42, "valid_loss": "3.956", "valid_nll_loss": "2.248", "valid_ppl": "4.75", "valid_num_updates": "195000", "valid_best_loss": "3.94524"}
{"epoch": 42, "valid_loss": "3.962", "valid_nll_loss": "2.248", "valid_ppl": "4.75", "valid_num_updates": "196000", "valid_best_loss": "3.94524"}
{"epoch": 42, "valid_loss": "3.954", "valid_nll_loss": "2.247", "valid_ppl": "4.75", "valid_num_updates": "197000", "valid_best_loss": "3.94524"}
{"epoch": 42, "valid_loss": "3.955", "valid_nll_loss": "2.243", "valid_ppl": "4.73", "valid_num_updates": "198000", "valid_best_loss": "3.94524"}
{"epoch": 42, "valid_loss": "3.948", "valid_nll_loss": "2.240", "valid_ppl": "4.72", "valid_num_updates": "199000", "valid_best_loss": "3.94524"}
{"epoch": 42, "valid_loss": "3.952", "valid_nll_loss": "2.251", "valid_ppl": "4.76", "valid_num_updates": "199542", "valid_best_loss": "3.94524"}
{"epoch": 43, "valid_loss": "3.956", "valid_nll_loss": "2.243", "valid_ppl": "4.73", "valid_num_updates": "200000", "valid_best_loss": "3.94524"}
{"epoch": 43, "valid_loss": "3.957", "valid_nll_loss": "2.247", "valid_ppl": "4.75", "valid_num_updates": "201000", "valid_best_loss": "3.94524"}
{"epoch": 43, "valid_loss": "3.957", "valid_nll_loss": "2.246", "valid_ppl": "4.74", "valid_num_updates": "202000", "valid_best_loss": "3.94524"}
{"epoch": 43, "valid_loss": "3.958", "valid_nll_loss": "2.247", "valid_ppl": "4.75", "valid_num_updates": "203000", "valid_best_loss": "3.94524"}
{"epoch": 43, "valid_loss": "3.951", "valid_nll_loss": "2.241", "valid_ppl": "4.73", "valid_num_updates": "204000", "valid_best_loss": "3.94524"}
{"epoch": 43, "valid_loss": "3.953", "valid_nll_loss": "2.248", "valid_ppl": "4.75", "valid_num_updates": "204293", "valid_best_loss": "3.94524"}
{"epoch": 44, "valid_loss": "3.954", "valid_nll_loss": "2.244", "valid_ppl": "4.74", "valid_num_updates": "205000", "valid_best_loss": "3.94524"}
{"epoch": 44, "valid_loss": "3.960", "valid_nll_loss": "2.249", "valid_ppl": "4.75", "valid_num_updates": "206000", "valid_best_loss": "3.94524"}
{"epoch": 44, "valid_loss": "3.956", "valid_nll_loss": "2.243", "valid_ppl": "4.73", "valid_num_updates": "207000", "valid_best_loss": "3.94524"}
{"epoch": 44, "valid_loss": "3.944", "valid_nll_loss": "2.233", "valid_ppl": "4.70", "valid_num_updates": "208000", "valid_best_loss": "3.94425"}
{"epoch": 44, "valid_loss": "3.953", "valid_nll_loss": "2.243", "valid_ppl": "4.73", "valid_num_updates": "209000", "valid_best_loss": "3.94425"}
{"epoch": 44, "valid_loss": "3.945", "valid_nll_loss": "2.238", "valid_ppl": "4.72", "valid_num_updates": "209044", "valid_best_loss": "3.94425"}
{"epoch": 45, "valid_loss": "3.950", "valid_nll_loss": "2.240", "valid_ppl": "4.72", "valid_num_updates": "210000", "valid_best_loss": "3.94425"}
{"epoch": 45, "valid_loss": "3.956", "valid_nll_loss": "2.245", "valid_ppl": "4.74", "valid_num_updates": "211000", "valid_best_loss": "3.94425"}
{"epoch": 45, "valid_loss": "3.947", "valid_nll_loss": "2.240", "valid_ppl": "4.72", "valid_num_updates": "212000", "valid_best_loss": "3.94425"}
{"epoch": 45, "valid_loss": "3.942", "valid_nll_loss": "2.236", "valid_ppl": "4.71", "valid_num_updates": "213000", "valid_best_loss": "3.94227"}
{"epoch": 45, "valid_loss": "3.942", "valid_nll_loss": "2.235", "valid_ppl": "4.71", "valid_num_updates": "213795", "valid_best_loss": "3.94227"}
{"epoch": 46, "valid_loss": "3.955", "valid_nll_loss": "2.243", "valid_ppl": "4.74", "valid_num_updates": "214000", "valid_best_loss": "3.94227"}
{"epoch": 46, "valid_loss": "3.951", "valid_nll_loss": "2.242", "valid_ppl": "4.73", "valid_num_updates": "215000", "valid_best_loss": "3.94227"}
{"epoch": 46, "valid_loss": "3.946", "valid_nll_loss": "2.239", "valid_ppl": "4.72", "valid_num_updates": "216000", "valid_best_loss": "3.94227"}
{"epoch": 46, "valid_loss": "3.951", "valid_nll_loss": "2.238", "valid_ppl": "4.72", "valid_num_updates": "217000", "valid_best_loss": "3.94227"}
{"epoch": 46, "valid_loss": "3.946", "valid_nll_loss": "2.236", "valid_ppl": "4.71", "valid_num_updates": "218000", "valid_best_loss": "3.94227"}
{"epoch": 46, "valid_loss": "3.949", "valid_nll_loss": "2.240", "valid_ppl": "4.72", "valid_num_updates": "218546", "valid_best_loss": "3.94227"}
{"epoch": 47, "valid_loss": "3.959", "valid_nll_loss": "2.251", "valid_ppl": "4.76", "valid_num_updates": "219000", "valid_best_loss": "3.94227"}
{"epoch": 47, "valid_loss": "3.952", "valid_nll_loss": "2.244", "valid_ppl": "4.74", "valid_num_updates": "220000", "valid_best_loss": "3.94227"}
{"epoch": 47, "valid_loss": "3.951", "valid_nll_loss": "2.241", "valid_ppl": "4.73", "valid_num_updates": "221000", "valid_best_loss": "3.94227"}
{"epoch": 47, "valid_loss": "3.948", "valid_nll_loss": "2.242", "valid_ppl": "4.73", "valid_num_updates": "222000", "valid_best_loss": "3.94227"}
{"epoch": 47, "valid_loss": "3.948", "valid_nll_loss": "2.237", "valid_ppl": "4.72", "valid_num_updates": "223000", "valid_best_loss": "3.94227"}
{"epoch": 47, "valid_loss": "3.943", "valid_nll_loss": "2.235", "valid_ppl": "4.71", "valid_num_updates": "223297", "valid_best_loss": "3.94227"}
{"epoch": 48, "valid_loss": "3.956", "valid_nll_loss": "2.243", "valid_ppl": "4.73", "valid_num_updates": "224000", "valid_best_loss": "3.94227"}
{"epoch": 48, "valid_loss": "3.949", "valid_nll_loss": "2.243", "valid_ppl": "4.73", "valid_num_updates": "225000", "valid_best_loss": "3.94227"}
{"epoch": 48, "valid_loss": "3.943", "valid_nll_loss": "2.234", "valid_ppl": "4.70", "valid_num_updates": "226000", "valid_best_loss": "3.94227"}
{"epoch": 48, "valid_loss": "3.944", "valid_nll_loss": "2.237", "valid_ppl": "4.71", "valid_num_updates": "227000", "valid_best_loss": "3.94227"}
{"epoch": 48, "valid_loss": "3.942", "valid_nll_loss": "2.237", "valid_ppl": "4.71", "valid_num_updates": "228000", "valid_best_loss": "3.94214"}
{"epoch": 48, "valid_loss": "3.947", "valid_nll_loss": "2.239", "valid_ppl": "4.72", "valid_num_updates": "228048", "valid_best_loss": "3.94214"}
{"epoch": 49, "valid_loss": "3.953", "valid_nll_loss": "2.244", "valid_ppl": "4.74", "valid_num_updates": "229000", "valid_best_loss": "3.94214"}
{"epoch": 49, "valid_loss": "3.954", "valid_nll_loss": "2.244", "valid_ppl": "4.74", "valid_num_updates": "230000", "valid_best_loss": "3.94214"}
{"epoch": 49, "valid_loss": "3.952", "valid_nll_loss": "2.245", "valid_ppl": "4.74", "valid_num_updates": "231000", "valid_best_loss": "3.94214"}
{"epoch": 49, "valid_loss": "3.946", "valid_nll_loss": "2.237", "valid_ppl": "4.72", "valid_num_updates": "232000", "valid_best_loss": "3.94214"}
{"epoch": 49, "valid_loss": "3.949", "valid_nll_loss": "2.244", "valid_ppl": "4.74", "valid_num_updates": "232799", "valid_best_loss": "3.94214"}
{"epoch": 50, "valid_loss": "3.948", "valid_nll_loss": "2.238", "valid_ppl": "4.72", "valid_num_updates": "233000", "valid_best_loss": "3.94214"}
I also checked the issues posted here #346 and followed exactly the same steps.
Issue Analytics
- State:
- Created 4 years ago
- Comments:9 (5 by maintainers)
Top Results From Across the Web
checkpoint averaging does not improve translation quality on ...
I downloaded the data set prepared by Google and ran the following command to preprocess the data set $ TEXT=wmt16_en_de_bpe32k $ mkdir ...
Read more >Revisiting Checkpoint Averaging for Neural Machine Translation
Abstract: Checkpoint averaging is a simple and effective method to boost the performance of converged neural machine translation models.
Read more >Effect of checkpoint averaging. All trained on 6 GPUs.
We examine some of the critical parameters that affect the final translation quality, memory usage, training stability and training time, concluding each ...
Read more >Best checkpoint selection for NMT - OpenNMT Forum
The main consideration is that the averaged model does not take more time while translating with ensemble decoding takes more time because it ......
Read more >Transforming machine translation: a deep learning system ...
Block-BT with checkpoint averaging clearly improves English→Czech news translation quality. To demonstrate that the benefits of our ...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Can you share the output of the average_checkpoints.py script? Is it selecting the right checkpoints to average?
pulled the latest update. It works now.
results of averaging 5 models, after compound splitting: