[RFC] introduce `config.trained_precision`
See original GitHub issueš Feature request
As we are discovering that bf16
-pretrained models donāt do well on fp16
āregimeā (and surely vice-versa), and some models are pre-trained in fp32
and surely wonāt do well on either bf16
or fp16
, and the problem is going to grow as more bf16
-supporting hardware comes out, I propose we start requiring that the model tells the user which mode it was pretrained under.
So I suggest we add config.trained_precision
which currently would be one of fp16
, bf16
, fp32
, unknown
.
I havenāt thoughts it through on how to derive this automatically during save_pretrained
, but when porting checkpoints the porter can figure that out and manually set this in the conversion script.
For example, from what I understood gtp-neo if bf16
for all but 2.7B
version, which is fp32
.
Issue Analytics
- State:
- Created 2 years ago
- Comments:11 (11 by maintainers)
Top GitHub Comments
I think the name is good. I would leave it to a default of
"unknown"
for all existing models, so that we donāt have to add it anywhere (especially when we donāt have the info). I would personally not try to guess it too much and only set that information when we have it from the people who trained the model.For 2, I donāt think we should try to guess it either when people are not using the
Trainer
and just focus on the trainer. We just need to add amodel.config.trained_precision = xxx
from the args and the env at the beginning of training, then thesave_pretrained
method, which also saves the config, will properly handle that.For 3, I would only populate the popular models, for which we have the info.
I think this feature would be welcome indeed and would save us a lot of trouble as weāve seen in the past. Regarding whether we want to have this in the model card or in the configuration, I guess it really depends on whether we want to enforce that with errors or warnings.
I think the main point of having that field is to actually warn the user when theyāre doing inference with a model trained with a different precision, and to that extent, having it in the configuration makes more sense.
I also think the configuration is here to detail how a checkpoint is configured: how the architecture fits the weights (hidden size, layers) and how it should be used (model type, architecture). I think it would make sense to have this as a configuration field, as not knowing that can result in an unusable checkpoint in other environments. I think thatās different from other training-related arguments, such as
gradient_checkpointing
, which donāt really make sense once ported to a different environment.