Train matching pretrained checkpoint for new dataset
See original GitHub issueHi all,
Some classmates and I are trying to train a fusion text generation model using the fairseq code, but on a new dataset that we have scraped and cleaned. We are struggling with how to train a usable pretrained checkpoint.
When I used the checkpoint (pretrained_checkpoint.pt) provided in the README, I got the following error:
RuntimeError: Error(s) in loading state_dict for FConvModelSelfAtt:
size mismatch for encoder.encoder.embed_tokens.weight: copying a param of torch.Size([3257, 256]) from checkpoint, where the shape is torch.Size([19025, 256]) in current model.
size mismatch for decoder.embed_tokens.weight: copying a param of torch.Size([51411, 256]) from checkpoint,
where the shape is torch.Size([104960, 256]) in current model.
size mismatch for decoder.fc3.weight: copying a param of torch.Size([51411, 256]) from checkpoint, where th
e shape is torch.Size([104960, 256]) in current model.
size mismatch for decoder.fc3.bias: copying a param of torch.Size([51411]) from checkpoint, where the shape
is torch.Size([104960]) in current model.
Here is the line of code that I ran, where “dummy_source” and “dummy_target” are our input and output languages.
$ python train.py data-bin/dummy -a fconv_self_att_wp --lr 0.25 \
--clip-norm 0.1 --max-tokens 1500 --lr-scheduler reduce_lr_on_plateau \
--decoder-attention True --encoder-attention False --criterion \
label_smoothed_cross_entropy --weight-decay .0000001 --label-smoothing 0 \
--source-lang dummy_source --target-lang dummy_target --gated-attention True \
--self-attention True --project-input True --pretrained True \
--pretrained-checkpoint data-bin/models/pretrained_checkpoint.pt
But when I attempted to use a checkpoint created by training on our new dataset (checkpoint_best.pt) I got:
Exception: Cannot load model parameters from checkpoint, please ensure that the architectures match
Here is the line of code I ran for that:
$ python train.py data-bin/dummy -a fconv_self_att_wp --lr 0.25 \
--clip-norm 0.1 --max-tokens 1500 --lr-scheduler reduce_lr_on_plateau \
--decoder-attention True --encoder-attention False --criterion \
label_smoothed_cross_entropy --weight-decay .0000001 --label-smoothing 0 \
--source-lang dummy_source --target-lang dummy_target --gated-attention True \
--self-attention True --project-input True --pretrained True \
--pretrained-checkpoint checkpoints/checkpoint_best.pt
How can we go about training a usable pretrained checkpoint on a new dataset?
Issue Analytics
- State:
- Created 5 years ago
- Comments:12 (2 by maintainers)
Top Results From Across the Web
Train matching pretrained checkpoint for new dataset #367
Hi all, Some classmates and I are trying to train a fusion text generation model using the fairseq code, but on a new...
Read more >Training checkpoints | TensorFlow Core
Use a tf.train.Checkpoint object to manually create a checkpoint, where the objects you want to checkpoint are set as attributes on the object....
Read more >Leveraging Pre-trained Language Model Checkpoints for ...
Pre-trained language models have established a new level of performance on NLU tasks and more and more research has been built upon ...
Read more >Leveraging Pre-trained Checkpoints for Sequence Generation ...
In this paper, we demonstrate the efficacy of pre-trained checkpoints for Sequence Generation. We developed a Transformer-based sequence-to-sequence model ...
Read more >Train a model — MMSegmentation 0.29.1 documentation
To trade speed with GPU memory, you may pass in --cfg-options model.backbone.with_cp=True to enable checkpoint in backbone.
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
You have to reuse the dictionary from the old dataset to build the new binary dataset. You’ll thus end with the same number of vocabulary classes, making the models match. The flags are in
preprocess.py
@jerinphilip does that mean my training hasn’t been accurate?