question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Train matching pretrained checkpoint for new dataset

See original GitHub issue

Hi all,

Some classmates and I are trying to train a fusion text generation model using the fairseq code, but on a new dataset that we have scraped and cleaned. We are struggling with how to train a usable pretrained checkpoint.

When I used the checkpoint (pretrained_checkpoint.pt) provided in the README, I got the following error:

RuntimeError: Error(s) in loading state_dict for FConvModelSelfAtt:
        size mismatch for encoder.encoder.embed_tokens.weight: copying a param of torch.Size([3257, 256]) from checkpoint, where the shape is torch.Size([19025, 256]) in current model.
        size mismatch for decoder.embed_tokens.weight: copying a param of torch.Size([51411, 256]) from checkpoint,
 where the shape is torch.Size([104960, 256]) in current model.
        size mismatch for decoder.fc3.weight: copying a param of torch.Size([51411, 256]) from checkpoint, where th
e shape is torch.Size([104960, 256]) in current model.
        size mismatch for decoder.fc3.bias: copying a param of torch.Size([51411]) from checkpoint, where the shape
 is torch.Size([104960]) in current model.

Here is the line of code that I ran, where “dummy_source” and “dummy_target” are our input and output languages.

    $ python train.py data-bin/dummy -a fconv_self_att_wp --lr 0.25 \
        --clip-norm 0.1 --max-tokens 1500 --lr-scheduler reduce_lr_on_plateau \
        --decoder-attention True --encoder-attention False --criterion \
        label_smoothed_cross_entropy --weight-decay .0000001 --label-smoothing 0 \
        --source-lang dummy_source --target-lang dummy_target --gated-attention True \
        --self-attention True --project-input True --pretrained True \
        --pretrained-checkpoint data-bin/models/pretrained_checkpoint.pt

But when I attempted to use a checkpoint created by training on our new dataset (checkpoint_best.pt) I got:

Exception: Cannot load model parameters from checkpoint, please ensure that the architectures match

Here is the line of code I ran for that:

    $ python train.py data-bin/dummy -a fconv_self_att_wp --lr 0.25 \
        --clip-norm 0.1 --max-tokens 1500 --lr-scheduler reduce_lr_on_plateau \
        --decoder-attention True --encoder-attention False --criterion \
        label_smoothed_cross_entropy --weight-decay .0000001 --label-smoothing 0 \
        --source-lang dummy_source --target-lang dummy_target --gated-attention True \
        --self-attention True --project-input True --pretrained True \
        --pretrained-checkpoint checkpoints/checkpoint_best.pt

How can we go about training a usable pretrained checkpoint on a new dataset?

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Comments:12 (2 by maintainers)

github_iconTop GitHub Comments

1reaction
jerinphilipcommented, Nov 18, 2018

You have to reuse the dictionary from the old dataset to build the new binary dataset. You’ll thus end with the same number of vocabulary classes, making the models match. The flags are in preprocess.py

0reactions
Arafat4341commented, Jan 1, 2020

@jerinphilip does that mean my training hasn’t been accurate?

Read more comments on GitHub >

github_iconTop Results From Across the Web

Train matching pretrained checkpoint for new dataset #367
Hi all, Some classmates and I are trying to train a fusion text generation model using the fairseq code, but on a new...
Read more >
Training checkpoints | TensorFlow Core
Use a tf.train.Checkpoint object to manually create a checkpoint, where the objects you want to checkpoint are set as attributes on the object....
Read more >
Leveraging Pre-trained Language Model Checkpoints for ...
Pre-trained language models have established a new level of performance on NLU tasks and more and more research has been built upon ...
Read more >
Leveraging Pre-trained Checkpoints for Sequence Generation ...
In this paper, we demonstrate the efficacy of pre-trained checkpoints for Sequence Generation. We developed a Transformer-based sequence-to-sequence model ...
Read more >
Train a model — MMSegmentation 0.29.1 documentation
To trade speed with GPU memory, you may pass in --cfg-options model.backbone.with_cp=True to enable checkpoint in backbone.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found