[Meta] discussion about weight management: should we allow ported weights
See original GitHub issueOriginal Post by @sayakpaul*
I agree that rescaling to [0, 1] is way simpler and easier to do but a significant amount of models could be supported off-the-shelf with this consideration I believe.
Expanding more on this comment and also summarizing what we (@LukeWood and I) discussed offline.
While supporting a model with a training script to reach SoTA numbers is a great feature to have I believe it introduces a significant amount of friction and redundancy too. Let me explain.
When the pre-trained parameters of a model are available officially but not in the expected format, I think it makes sense to just port those parameters so that they can be loaded into the Keras implementation. Refer to this repository as an example: https://github.com/sayakpaul/keras-convnext-conversion/. As far as I know, this strategy was followed for a number of models under keras.applications
. ResNet-RS, EfficientNetV2, ConvNeXt, for example. This strategy also allows us to seamlessly convert the pre-trained checkpoints that are for bigger datasets like ImageNet-21k. Repeating the pre-training with such datasets will again be time-consuming and repetitive work given the official parameters are available.
Furthermore, nowadays, researchers have started pre-training with self-supervision and sem-supervision and they are often able to surpass what’s possible with standard supervised pre-training. If we allow the addition of models populated with pre-trained official parameters then we can factor this in too. Otherwise, figuring out the nitty-gritty of a particular pre-training technique can be quite challenging. Of course, having the pre-training script (ensuring implementation correctness) should still be welcomed.
IMO, if the models exported in this way are able to match the metrics reported in the official sources (official repositories, corresponding publications, etc.) and we are able to get sufficiently decent performance on downstream tasks, it should suffice for validating the implementation. It also helps the community experiment with these models faster.
Another point to consider is that not all models are trained using one end-all recipe.
_Originally posted by @sayakpaul in https://github.com/keras-team/keras-cv/issues/476#issuecomment-1151219108_
Issue Analytics
- State:
- Created a year ago
- Reactions:1
- Comments:21 (19 by maintainers)
Top GitHub Comments
This requires that we could expect always a traing job/scripts for model <= treshold flops/size.
Here I was talking about what it is exclusively related to the learning phase not the inference.
Right on! I agree.
Do you mean if the original parameters fail to produce the reported numbers? If so, I think a fair thing to do is to vet those parameters first before proceeding. In my experience, luckily, I haven’t encountered non-reproducible original parameters.
For evaluation (on ImageNet-1k validation set, let’s say), yes!
Oh absolutely. I can cite your work on ResNet-RS (amongst other fews) countless times.