question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

The code of OFA-base is inconsistent with the pre-trained checkpoint

See original GitHub issue

Thanks for your awesome work. Something has bothered me recently. When I continued to train OFA-base (I tried to collect all the pre-training data of OFA), I found that a few training steps (10 steps) would make the performance of OFA-base worse. I checked the config in checkpoint and the config in pretrain_ofa_base.sh, and found many differences. What might affect the results?

In addition, I found that there is a dimension inconsistency in the network. decoder.image_position_idx": "<class 'torch.Tensor'> torch.Size([1026]) in code and decoder.image_position_idx": "<class 'torch.Tensor'> torch.Size([1025]) in ckpt. Is this the reason for the decline of performance?

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:10

github_iconTop GitHub Comments

1reaction
logicwongcommented, Sep 13, 2022

@zzhanghub @Charles-Xie Changing the dimension of image_position_idx to 1026 is to be consistent with the dimension of embed_positions, it shouldn’t affect the performance. I guess you save the model during training, but even if the model parameters are not updated, the parameters of batch_norm will still be updated and degenerate the performance. You can set --freeze-resnet to fix the problem.

@zzhanghub For detailed differents between resaved ckpt and ofa-base ckpt:

  1. That’s ok.
  2. In order to improve the pre-training, we use a larger sample_patch_num in the later stage of training. But in recent experiments, we found that using --patch-image-size=256 without sampling patches is better.
  3. As mentioned above, this does not affect performance。
  4. The path of neg_sample_dir has some sensitive information, so we manually deleted it before releasing ckpt.
  5. The model was interrupted many times during pre-training, we reset the learning rate a few times during this period, but this should have little effect on the performance
0reactions
zzhanghubcommented, Oct 12, 2022

Thank you !!!

Read more comments on GitHub >

github_iconTop Results From Across the Web

"Some weights of the model checkpoint at bert-base ...
The Linear layer weights are trained from the next sentence prediction (classification) objective during Bert pretraining. This output is usually *not* a good ......
Read more >
Using pretrained models - Hugging Face Course
Let's say we're looking for a French-based model that can perform mask filling. Selecting the Camembert model. We select the camembert-base checkpoint to...
Read more >
CrossTransformers: spatially-aware few-shot transfer
Our experiments with CrossTransformers use no pretraining, although we use it for the experiments involving Prototypical Nets to be consistent ...
Read more >
Release 1.0.0 Hayden Housen
TransformerSum is a library that aims to make it easy to train, evaluate, and use machine learning transformer models that perform automatic summarization....
Read more >
OFA: Unifying Architectures, Tasks, and Modalities Through a ...
Our code and models are publicly available at ... OFA is pretrained on the publicly available datasets of ... and Large sizes, OFABase...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found