Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[Meta] discussion about weight management: should we allow ported weights

See original GitHub issue

Original Post by @sayakpaul*

I agree that rescaling to [0, 1] is way simpler and easier to do but a significant amount of models could be supported off-the-shelf with this consideration I believe.

Expanding more on this comment and also summarizing what we (@LukeWood and I) discussed offline.

While supporting a model with a training script to reach SoTA numbers is a great feature to have I believe it introduces a significant amount of friction and redundancy too. Let me explain.

When the pre-trained parameters of a model are available officially but not in the expected format, I think it makes sense to just port those parameters so that they can be loaded into the Keras implementation. Refer to this repository as an example: https://github.com/sayakpaul/keras-convnext-conversion/. As far as I know, this strategy was followed for a number of models under keras.applications. ResNet-RS, EfficientNetV2, ConvNeXt, for example. This strategy also allows us to seamlessly convert the pre-trained checkpoints that are for bigger datasets like ImageNet-21k. Repeating the pre-training with such datasets will again be time-consuming and repetitive work given the official parameters are available.

Furthermore, nowadays, researchers have started pre-training with self-supervision and sem-supervision and they are often able to surpass what’s possible with standard supervised pre-training. If we allow the addition of models populated with pre-trained official parameters then we can factor this in too. Otherwise, figuring out the nitty-gritty of a particular pre-training technique can be quite challenging. Of course, having the pre-training script (ensuring implementation correctness) should still be welcomed.

IMO, if the models exported in this way are able to match the metrics reported in the official sources (official repositories, corresponding publications, etc.) and we are able to get sufficiently decent performance on downstream tasks, it should suffice for validating the implementation. It also helps the community experiment with these models faster.

Another point to consider is that not all models are trained using one end-all recipe.

_Originally posted by @sayakpaul in https://github.com/keras-team/keras-cv/issues/476#issuecomment-1151219108_

Issue Analytics

State:
Created a year ago
Reactions:1
Comments:21 (19 by maintainers)

Top GitHub Comments

1reaction

bhackcommented, Jun 14, 2022

If training from scratch can be done then no need to worry about this point in the first place.

This requires that we could expect always a traing job/scripts for model <= treshold flops/size.

I don’t think so. Ensuring the ported model meets the desired number should also be considered.

Here I was talking about what it is exclusively related to the learning phase not the inference.

1reaction

sayakpaulcommented, Jun 14, 2022

Right on! I agree.

If the original scripts are not reproducible, I’m not sure if we can consider converted weights reproducible. This is not always the case.

Do you mean if the original parameters fail to produce the reported numbers? If so, I think a fair thing to do is to vet those parameters first before proceeding. In my experience, luckily, I haven’t encountered non-reproducible original parameters.

If they use certain preprocessing, then the same preprocessing must be used in this implementation.

For evaluation (on ImageNet-1k validation set, let’s say), yes!

I am not against this approach - I have also done weights transfer a few times. I still think this is a tradeoff.

Oh absolutely. I can cite your work on ResNet-RS (amongst other fews) countless times.

Top Results From Across the Web

Practices Associated with Weight Loss Versus ... - CiteSeerX

lead to weight loss are different from the practices that help one maintain weight loss. In the present study, the question of whether...

A meta‐analysis comparing the effectiveness of alternate day ...

this meta-analysis are the following: 1) compare the effectiveness of different IF regimens on weight loss in the general population; ...

Effectiveness of Weight Management Interventions in Children

Meta -analysis confirmed that among comprehensive weight-management programs, moderate- to high-intensity interventions had a homogeneous (I2. 0%), significantly ...

Motivational interviewing to improve weight loss in overweight ...

Obesity Management. Motivational interviewing to improve weight loss in overweight and/or obese patients: a systematic review and meta-analysis of ...

Diet or Exercise Interventions vs Combined Behavioral Weight ...

A meta-analysis from Johns et al. identified that there was no difference in weight loss in the short-term for diet-only/exercise-only ...