Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

fp16 models getting auto converted to fp32 in .from_pretrained()

See original GitHub issue

stas00 edited: this Issue has nothing to do with Deepspeed, but pure transformers

Environment info

transformers version: 4.6.1
Platform: Linux-3.10.0-1127.13.1.el7.x86_64-x86_64-with-centos-7.7.1908-Core
Python version: 3.6.8
PyTorch version (GPU?): 1.6.0+cu92 (True)
Tensorflow version (GPU?): not installed (NA)
Using GPU in script?: Yes (not essential)
Using distributed or parallel set-up in script?: Yes (not essential)

Who can help

@LysandreJik @sgugger

Information

Model I am using (Bert, XLNet …): BertForMaskedLM

The problem arises when using:

my own modified scripts: (give details below)

The tasks I am working on is:

my own task or dataset: (give details below) Masked LM

To reproduce

Steps to reproduce the behavior:

Finetune a 16-bit low precision BertForMaskedLM model on any dataset using DeepSpeed and Trainer
Load the model and check the dtype using:


from transformers import BertTokenizer, BertForMaskedLM
tokenizer = BertTokenizer.from_pretrained(tokenizer_path)
model = BertForMaskedLM.from_pretrained(model_path)
print(model.dtype)

Expected behavior

Outputs torch.float32 instead of the expected torch.float16. I was able to recover the original weights using model.half()

I think it would be helpful to highlight this behaviour of forced autoconversion either as a warning or as a part of from_pretrained() method’s documentation or provide an additional argument to help retain fp16 weights. Willing to pick this issue up. Please let me know what would be the most appropriate fix.

Issue Analytics

State:
Created 2 years ago
Comments:30 (25 by maintainers)

Top GitHub Comments

2reactions

ShadenSmithcommented, Jun 9, 2021

Hi @asit2898 , thanks for reporting your issue. I can help look at things from DeepSpeed’s side.

Was the model fine-tuned with ZeRO enabled? From the DS config above it seems not, unless it is enabled somewhere on the HF side of things.

@stas00 , does the from_pretrained codepath go through DeepSpeed’s load_checkpoint(), or is the checkpoint logic all on HF’s side?

To start, I did a quick experiment with DeepSpeed (without ZeRO) and examined model parameter dtypes before and after deepspeed.initialize(). So far I haven’t reproduced the issue:

When FP16 is not enabled, the model’s dtype is unchanged (eg., fp32 stays fp32 and fp16 stays fp16).
When fp16 is enabled, the model weights are fp16 after deepspeed.initialize() no matter the initial dtype of fp32 or fp16.

1reaction

sguggercommented, Jun 21, 2021

Yes, from_config uses just 1d. For your question, I’m not aware of such a situation existing.

Top Results From Across the Web

Models - Hugging Face

Under Pytorch a model normally gets instantiated with torch.float32 format. This can be an issue if one tries to load a model whose...

Quantization — PyTorch 1.13 documentation

In most cases the model is trained in FP32 and then the model is converted to INT8. In addition, PyTorch also supports quantization...

Compare Methods for Converting and Optimizing ... - Wandb

In this report, we're going to walk through how to convert trained PyTorch and Keras models to slimmer, leaner models for deployment.

Jigsaw: PyTorch Lightning + FP16 + GPU/TPU + W&B | Kaggle

Half-precision floating point format (FP16) uses 16 bits, compared to 32 bits for single precision (FP32). Lowering the required memory enables training of ......

Mixed precision training - fastai

We will need a function to convert all the layers of the model to FP16 precision except the BatchNorm-like layers (since those need...