question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. ItĀ collects links to all the places you might be looking at while hunting down a tough bug.

And, if youā€™re still stuck at the end, weā€™re happy to hop on a call to see how we can help out.

[RFC] introduce `config.trained_precision`

See original GitHub issue

šŸš€ Feature request

As we are discovering that bf16-pretrained models donā€™t do well on fp16 ā€œregimeā€ (and surely vice-versa), and some models are pre-trained in fp32 and surely wonā€™t do well on either bf16 or fp16, and the problem is going to grow as more bf16-supporting hardware comes out, I propose we start requiring that the model tells the user which mode it was pretrained under.

So I suggest we add config.trained_precision which currently would be one of fp16, bf16, fp32, unknown.

I havenā€™t thoughts it through on how to derive this automatically during save_pretrained, but when porting checkpoints the porter can figure that out and manually set this in the conversion script.

For example, from what I understood gtp-neo if bf16 for all but 2.7B version, which is fp32.

@sgugger, @LysandreJik

Issue Analytics

  • State:open
  • Created 2 years ago
  • Comments:11 (11 by maintainers)

github_iconTop GitHub Comments

2reactions
sguggercommented, Apr 13, 2021

I think the name is good. I would leave it to a default of "unknown" for all existing models, so that we donā€™t have to add it anywhere (especially when we donā€™t have the info). I would personally not try to guess it too much and only set that information when we have it from the people who trained the model.

For 2, I donā€™t think we should try to guess it either when people are not using the Trainer and just focus on the trainer. We just need to add a model.config.trained_precision = xxx from the args and the env at the beginning of training, then the save_pretrained method, which also saves the config, will properly handle that.

For 3, I would only populate the popular models, for which we have the info.

1reaction
LysandreJikcommented, Apr 13, 2021

I think this feature would be welcome indeed and would save us a lot of trouble as weā€™ve seen in the past. Regarding whether we want to have this in the model card or in the configuration, I guess it really depends on whether we want to enforce that with errors or warnings.

I think the main point of having that field is to actually warn the user when theyā€™re doing inference with a model trained with a different precision, and to that extent, having it in the configuration makes more sense.

I also think the configuration is here to detail how a checkpoint is configured: how the architecture fits the weights (hidden size, layers) and how it should be used (model type, architecture). I think it would make sense to have this as a configuration field, as not knowing that can result in an unusable checkpoint in other environments. I think thatā€™s different from other training-related arguments, such as gradient_checkpointing, which donā€™t really make sense once ported to a different environment.

Read more comments on GitHub >

github_iconTop Results From Across the Web

RFC 2131 Dynamic Host Configuration Protocol March 1997
Introduction The Dynamic Host Configuration Protocol (DHCP) provides configuration parameters to Internet hosts. DHCP consists of two components: a protocolĀ ...
Read more >
RFC 3512: Configuring Networks and Devices with Simple ...
1) Operations must understand and must be trained in the operation of a given technology. Ā· 2) Systems undergoing configuration changes must be...
Read more >
Dynamic Host Configuration Protocol - Wikipedia
"DHCP" redirects here. For other uses, see DHCP (disambiguation). The Dynamic Host Configuration Protocol (DHCP) is a network management protocol used onĀ ...
Read more >
Chapter 20. Configuring PTP Using ptp4l Red Hat Enterprise ...
20.1. Introduction to PTP Ā· The Precision Time Protocol (PTP) is a protocol used to synchronize clocks in a network. When used in...
Read more >
RFC 1305 - University of Delaware
launch a message to arrive at the reference clock at a specified time. ... configuration to produce the most accurate and reliable time,...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found