T5 fp16 forward yields nan
See original GitHub issue🐛 Bug
Information
Model I am using (Bert, XLNet …): T5
Language I am using the model on (English, Chinese …): English
The problem arises when using:
- the official example scripts: (give details below)
- my own modified scripts: (give details below)
The tasks I am working on is:
- an official GLUE/SQUaD task: (give the name)
- my own task or dataset: (give details below)
To reproduce
I use pytorch-lightning to manage fp16. This is the minimal example that reproduces the result.
from transformers import T5Model, T5Tokenizer
tokenizer = T5Tokenizer.from_pretrained("t5-base")
model = T5Model.from_pretrained("t5-base").cuda().half()
text = "hello world!"
inputs = tokenizer.encode(text, return_tensors="pt").cuda()
out = model(input_ids=inputs, decoder_input_ids=inputs)
print(out[0][:, :, :10])
output:
tensor([[[nan, nan, nan, nan, nan, nan, nan, nan, nan, nan],
[nan, nan, nan, nan, nan, nan, nan, nan, nan, nan],
[nan, nan, nan, nan, nan, nan, nan, nan, nan, nan]]], device='cuda:0',
dtype=torch.float16, grad_fn=<SliceBackward>)
Expected behavior
Get non-nan values.
Environment info
transformers
version: 2.9.0- Platform: Linux-4.15.0-88-generic-x86_64-with-debian-buster-sid
- Python version: 3.7.6
- PyTorch version (GPU?): 1.4.0 (True)
- Tensorflow version (GPU?): not installed (NA)
- Using GPU in script?: Yes
- Using distributed or parallel set-up in script?: No
Issue Analytics
- State:
- Created 3 years ago
- Reactions:2
- Comments:6 (2 by maintainers)
Top Results From Across the Web
T5 fp16 issue is fixed - Transformers - Hugging Face Forums
Previously, there was an issue when using T5 models in fp16 ; it was producing nan loss and logits . Now on the...
Read more >FP16 model Inference on GPU gives all Nan values in output ...
Hi there, I'm working on a project that involves Token2Token-Vision Transformer for classification task. Information on model here. During conversion.
Read more >How to avoid huggingface t5-based seq to seq suddenly ...
My main question here is: Why would this result in the yielded loss suddenly becoming nan and the model, if .backwards is called...
Read more >Release Notes :: NVIDIA Deep Learning cuDNN Documentation
Additional tensor layout support was added for the forward and backwards ... For packed NCHW tensors using the FP16 datatype, cuDNN attempts to...
Read more >Nan Loss with torch.cuda.amp and CrossEntropyLoss
I am trying to train a DDP model (one GPU per process, but I've added the with autocast(enabled=args.use_mp): to model forward just in...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Thanks for the detailed error description @binshengliu! I linked a PR that should fix it 😃
Same when fine-tuning GPT Neo.