Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Train matching pretrained checkpoint for new dataset

See original GitHub issue

Hi all,

Some classmates and I are trying to train a fusion text generation model using the fairseq code, but on a new dataset that we have scraped and cleaned. We are struggling with how to train a usable pretrained checkpoint.

When I used the checkpoint (pretrained_checkpoint.pt) provided in the README, I got the following error:

RuntimeError: Error(s) in loading state_dict for FConvModelSelfAtt:
        size mismatch for encoder.encoder.embed_tokens.weight: copying a param of torch.Size([3257, 256]) from checkpoint, where the shape is torch.Size([19025, 256]) in current model.
        size mismatch for decoder.embed_tokens.weight: copying a param of torch.Size([51411, 256]) from checkpoint,
 where the shape is torch.Size([104960, 256]) in current model.
        size mismatch for decoder.fc3.weight: copying a param of torch.Size([51411, 256]) from checkpoint, where th
e shape is torch.Size([104960, 256]) in current model.
        size mismatch for decoder.fc3.bias: copying a param of torch.Size([51411]) from checkpoint, where the shape
 is torch.Size([104960]) in current model.

Here is the line of code that I ran, where “dummy_source” and “dummy_target” are our input and output languages.

    $ python train.py data-bin/dummy -a fconv_self_att_wp --lr 0.25 \
        --clip-norm 0.1 --max-tokens 1500 --lr-scheduler reduce_lr_on_plateau \
        --decoder-attention True --encoder-attention False --criterion \
        label_smoothed_cross_entropy --weight-decay .0000001 --label-smoothing 0 \
        --source-lang dummy_source --target-lang dummy_target --gated-attention True \
        --self-attention True --project-input True --pretrained True \
        --pretrained-checkpoint data-bin/models/pretrained_checkpoint.pt

But when I attempted to use a checkpoint created by training on our new dataset (checkpoint_best.pt) I got:

Exception: Cannot load model parameters from checkpoint, please ensure that the architectures match

Here is the line of code I ran for that:

    $ python train.py data-bin/dummy -a fconv_self_att_wp --lr 0.25 \
        --clip-norm 0.1 --max-tokens 1500 --lr-scheduler reduce_lr_on_plateau \
        --decoder-attention True --encoder-attention False --criterion \
        label_smoothed_cross_entropy --weight-decay .0000001 --label-smoothing 0 \
        --source-lang dummy_source --target-lang dummy_target --gated-attention True \
        --self-attention True --project-input True --pretrained True \
        --pretrained-checkpoint checkpoints/checkpoint_best.pt

How can we go about training a usable pretrained checkpoint on a new dataset?

Issue Analytics

State:
Created 5 years ago
Comments:12 (2 by maintainers)

Top GitHub Comments

1reaction

jerinphilipcommented, Nov 18, 2018

You have to reuse the dictionary from the old dataset to build the new binary dataset. You’ll thus end with the same number of vocabulary classes, making the models match. The flags are in preprocess.py

0reactions

Arafat4341commented, Jan 1, 2020

@jerinphilip does that mean my training hasn’t been accurate?

Top Results From Across the Web

Train matching pretrained checkpoint for new dataset #367

Hi all, Some classmates and I are trying to train a fusion text generation model using the fairseq code, but on a new...

Training checkpoints | TensorFlow Core

Use a tf.train.Checkpoint object to manually create a checkpoint, where the objects you want to checkpoint are set as attributes on the object....

Leveraging Pre-trained Language Model Checkpoints for ...

Pre-trained language models have established a new level of performance on NLU tasks and more and more research has been built upon ...

Leveraging Pre-trained Checkpoints for Sequence Generation ...

In this paper, we demonstrate the efficacy of pre-trained checkpoints for Sequence Generation. We developed a Transformer-based sequence-to-sequence model ...

Train a model — MMSegmentation 0.29.1 documentation

To trade speed with GPU memory, you may pass in --cfg-options model.backbone.with_cp=True to enable checkpoint in backbone.