question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

fp16 models getting auto converted to fp32 in .from_pretrained()

See original GitHub issue

stas00 edited: this Issue has nothing to do with Deepspeed, but pure transformers


Environment info

  • transformers version: 4.6.1
  • Platform: Linux-3.10.0-1127.13.1.el7.x86_64-x86_64-with-centos-7.7.1908-Core
  • Python version: 3.6.8
  • PyTorch version (GPU?): 1.6.0+cu92 (True)
  • Tensorflow version (GPU?): not installed (NA)
  • Using GPU in script?: Yes (not essential)
  • Using distributed or parallel set-up in script?: Yes (not essential)

Who can help

@LysandreJik @sgugger

Information

Model I am using (Bert, XLNet …): BertForMaskedLM

The problem arises when using:

  • my own modified scripts: (give details below)

The tasks I am working on is:

  • my own task or dataset: (give details below) Masked LM

To reproduce

Steps to reproduce the behavior:

  1. Finetune a 16-bit low precision BertForMaskedLM model on any dataset using DeepSpeed and Trainer
  2. Load the model and check the dtype using:

from transformers import BertTokenizer, BertForMaskedLM
tokenizer = BertTokenizer.from_pretrained(tokenizer_path)
model = BertForMaskedLM.from_pretrained(model_path)
print(model.dtype)

Expected behavior

Outputs torch.float32 instead of the expected torch.float16. I was able to recover the original weights using model.half()

I think it would be helpful to highlight this behaviour of forced autoconversion either as a warning or as a part of from_pretrained() method’s documentation or provide an additional argument to help retain fp16 weights. Willing to pick this issue up. Please let me know what would be the most appropriate fix.

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:30 (25 by maintainers)

github_iconTop GitHub Comments

2reactions
ShadenSmithcommented, Jun 9, 2021

Hi @asit2898 , thanks for reporting your issue. I can help look at things from DeepSpeed’s side.

Was the model fine-tuned with ZeRO enabled? From the DS config above it seems not, unless it is enabled somewhere on the HF side of things.

@stas00 , does the from_pretrained codepath go through DeepSpeed’s load_checkpoint(), or is the checkpoint logic all on HF’s side?

To start, I did a quick experiment with DeepSpeed (without ZeRO) and examined model parameter dtypes before and after deepspeed.initialize(). So far I haven’t reproduced the issue:

  • When FP16 is not enabled, the model’s dtype is unchanged (eg., fp32 stays fp32 and fp16 stays fp16).
  • When fp16 is enabled, the model weights are fp16 after deepspeed.initialize() no matter the initial dtype of fp32 or fp16.
1reaction
sguggercommented, Jun 21, 2021

Yes, from_config uses just 1d. For your question, I’m not aware of such a situation existing.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Models - Hugging Face
Under Pytorch a model normally gets instantiated with torch.float32 format. This can be an issue if one tries to load a model whose...
Read more >
Quantization — PyTorch 1.13 documentation
In most cases the model is trained in FP32 and then the model is converted to INT8. In addition, PyTorch also supports quantization...
Read more >
Compare Methods for Converting and Optimizing ... - Wandb
In this report, we're going to walk through how to convert trained PyTorch and Keras models to slimmer, leaner models for deployment.
Read more >
Jigsaw: PyTorch Lightning + FP16 + GPU/TPU + W&B | Kaggle
Half-precision floating point format (FP16) uses 16 bits, compared to 32 bits for single precision (FP32). Lowering the required memory enables training of ......
Read more >
Mixed precision training - fastai
We will need a function to convert all the layers of the model to FP16 precision except the BatchNorm-like layers (since those need...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found