fp16 models getting auto converted to fp32 in .from_pretrained()
See original GitHub issuestas00 edited: this Issue has nothing to do with Deepspeed, but pure transformers
Environment info
transformers
version: 4.6.1- Platform: Linux-3.10.0-1127.13.1.el7.x86_64-x86_64-with-centos-7.7.1908-Core
- Python version: 3.6.8
- PyTorch version (GPU?): 1.6.0+cu92 (True)
- Tensorflow version (GPU?): not installed (NA)
- Using GPU in script?: Yes (not essential)
- Using distributed or parallel set-up in script?: Yes (not essential)
Who can help
Information
Model I am using (Bert, XLNet …): BertForMaskedLM
The problem arises when using:
- my own modified scripts: (give details below)
The tasks I am working on is:
- my own task or dataset: (give details below) Masked LM
To reproduce
Steps to reproduce the behavior:
- Finetune a 16-bit low precision BertForMaskedLM model on any dataset using DeepSpeed and Trainer
- Load the model and check the dtype using:
from transformers import BertTokenizer, BertForMaskedLM
tokenizer = BertTokenizer.from_pretrained(tokenizer_path)
model = BertForMaskedLM.from_pretrained(model_path)
print(model.dtype)
Expected behavior
Outputs torch.float32 instead of the expected torch.float16. I was able to recover the original weights using model.half()
I think it would be helpful to highlight this behaviour of forced autoconversion either as a warning or as a part of from_pretrained() method’s documentation or provide an additional argument to help retain fp16 weights. Willing to pick this issue up. Please let me know what would be the most appropriate fix.
Issue Analytics
- State:
- Created 2 years ago
- Comments:30 (25 by maintainers)
Top Results From Across the Web
Models - Hugging Face
Under Pytorch a model normally gets instantiated with torch.float32 format. This can be an issue if one tries to load a model whose...
Read more >Quantization — PyTorch 1.13 documentation
In most cases the model is trained in FP32 and then the model is converted to INT8. In addition, PyTorch also supports quantization...
Read more >Compare Methods for Converting and Optimizing ... - Wandb
In this report, we're going to walk through how to convert trained PyTorch and Keras models to slimmer, leaner models for deployment.
Read more >Jigsaw: PyTorch Lightning + FP16 + GPU/TPU + W&B | Kaggle
Half-precision floating point format (FP16) uses 16 bits, compared to 32 bits for single precision (FP32). Lowering the required memory enables training of ......
Read more >Mixed precision training - fastai
We will need a function to convert all the layers of the model to FP16 precision except the BatchNorm-like layers (since those need...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Hi @asit2898 , thanks for reporting your issue. I can help look at things from DeepSpeed’s side.
Was the model fine-tuned with ZeRO enabled? From the DS config above it seems not, unless it is enabled somewhere on the HF side of things.
@stas00 , does the
from_pretrained
codepath go through DeepSpeed’sload_checkpoint()
, or is the checkpoint logic all on HF’s side?To start, I did a quick experiment with DeepSpeed (without ZeRO) and examined model parameter dtypes before and after
deepspeed.initialize()
. So far I haven’t reproduced the issue:deepspeed.initialize()
no matter the initial dtype of fp32 or fp16.Yes,
from_config
uses just 1d. For your question, I’m not aware of such a situation existing.