Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[RFC] introduce `config.trained_precision`

See original GitHub issue

🚀 Feature request

As we are discovering that bf16-pretrained models don’t do well on fp16 “regime” (and surely vice-versa), and some models are pre-trained in fp32 and surely won’t do well on either bf16 or fp16, and the problem is going to grow as more bf16-supporting hardware comes out, I propose we start requiring that the model tells the user which mode it was pretrained under.

So I suggest we add config.trained_precision which currently would be one of fp16, bf16, fp32, unknown.

I haven’t thoughts it through on how to derive this automatically during save_pretrained, but when porting checkpoints the porter can figure that out and manually set this in the conversion script.

For example, from what I understood gtp-neo if bf16 for all but 2.7B version, which is fp32.

@sgugger, @LysandreJik

Issue Analytics

State:
Created 2 years ago
Comments:11 (11 by maintainers)

Top GitHub Comments

2reactions

sguggercommented, Apr 13, 2021

I think the name is good. I would leave it to a default of "unknown" for all existing models, so that we don’t have to add it anywhere (especially when we don’t have the info). I would personally not try to guess it too much and only set that information when we have it from the people who trained the model.

For 2, I don’t think we should try to guess it either when people are not using the Trainer and just focus on the trainer. We just need to add a model.config.trained_precision = xxx from the args and the env at the beginning of training, then the save_pretrained method, which also saves the config, will properly handle that.

For 3, I would only populate the popular models, for which we have the info.

1reaction

LysandreJikcommented, Apr 13, 2021

I think this feature would be welcome indeed and would save us a lot of trouble as we’ve seen in the past. Regarding whether we want to have this in the model card or in the configuration, I guess it really depends on whether we want to enforce that with errors or warnings.

I think the main point of having that field is to actually warn the user when they’re doing inference with a model trained with a different precision, and to that extent, having it in the configuration makes more sense.

I also think the configuration is here to detail how a checkpoint is configured: how the architecture fits the weights (hidden size, layers) and how it should be used (model type, architecture). I think it would make sense to have this as a configuration field, as not knowing that can result in an unusable checkpoint in other environments. I think that’s different from other training-related arguments, such as gradient_checkpointing, which don’t really make sense once ported to a different environment.

Top Results From Across the Web

RFC 2131 Dynamic Host Configuration Protocol March 1997

Introduction The Dynamic Host Configuration Protocol (DHCP) provides configuration parameters to Internet hosts. DHCP consists of two components: a protocol ...

RFC 3512: Configuring Networks and Devices with Simple ...

1) Operations must understand and must be trained in the operation of a given technology. · 2) Systems undergoing configuration changes must be...

Dynamic Host Configuration Protocol - Wikipedia

"DHCP" redirects here. For other uses, see DHCP (disambiguation). The Dynamic Host Configuration Protocol (DHCP) is a network management protocol used on ...

Chapter 20. Configuring PTP Using ptp4l Red Hat Enterprise ...

20.1. Introduction to PTP · The Precision Time Protocol (PTP) is a protocol used to synchronize clocks in a network. When used in...

RFC 1305 - University of Delaware

launch a message to arrive at the reference clock at a specified time. ... configuration to produce the most accurate and reliable time,...