The code of OFA-base is inconsistent with the pre-trained checkpoint
See original GitHub issueThanks for your awesome work. Something has bothered me recently. When I continued to train OFA-base (I tried to collect all the pre-training data of OFA), I found that a few training steps (10 steps) would make the performance of OFA-base worse. I checked the config in checkpoint and the config in pretrain_ofa_base.sh, and found many differences. What might affect the results?
In addition, I found that there is a dimension inconsistency in the network. decoder.image_position_idx": "<class 'torch.Tensor'> torch.Size([1026])
in code and decoder.image_position_idx": "<class 'torch.Tensor'> torch.Size([1025])
in ckpt.
Is this the reason for the decline of performance?
Issue Analytics
- State:
- Created a year ago
- Comments:10
Top Results From Across the Web
"Some weights of the model checkpoint at bert-base ...
The Linear layer weights are trained from the next sentence prediction (classification) objective during Bert pretraining. This output is usually *not* a good ......
Read more >Using pretrained models - Hugging Face Course
Let's say we're looking for a French-based model that can perform mask filling. Selecting the Camembert model. We select the camembert-base checkpoint to...
Read more >CrossTransformers: spatially-aware few-shot transfer
Our experiments with CrossTransformers use no pretraining, although we use it for the experiments involving Prototypical Nets to be consistent ...
Read more >Release 1.0.0 Hayden Housen
TransformerSum is a library that aims to make it easy to train, evaluate, and use machine learning transformer models that perform automatic summarization....
Read more >OFA: Unifying Architectures, Tasks, and Modalities Through a ...
Our code and models are publicly available at ... OFA is pretrained on the publicly available datasets of ... and Large sizes, OFABase...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
@zzhanghub @Charles-Xie Changing the dimension of
image_position_idx
to 1026 is to be consistent with the dimension ofembed_positions
, it shouldn’t affect the performance. I guess you save the model during training, but even if the model parameters are not updated, the parameters ofbatch_norm
will still be updated and degenerate the performance. You can set--freeze-resnet
to fix the problem.@zzhanghub For detailed differents between resaved ckpt and ofa-base ckpt:
neg_sample_dir
has some sensitive information, so we manually deleted it before releasing ckpt.Thank you !!!