question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

fp16 causes loss to be nan?

See original GitHub issue

Hi! When using the fp16 option, my loss becomes nan. I’m using a V100. Is there any other option I need to configure besides fp16=True in the trainer?

Issue Analytics

  • State:closed
  • Created a year ago
  • Reactions:1
  • Comments:37 (26 by maintainers)

github_iconTop GitHub Comments

1reaction
lucidrainscommented, Jul 19, 2022

@vedantroy you should let gwern know that you are training on danbooru 😆

1reaction
lucidrainscommented, Jul 19, 2022

@vedantroy yea, pytorch autocast will take care of converting the types between the boundaries of code

Read more comments on GitHub >

github_iconTop Results From Across the Web

`--fp16` causing loss to go to Inf or NaN · Issue #169 - GitHub
This is just a general problem with fp16. It's much more sensitive to the parameters. The gradient clipping change with DeepSpeed is great, ......
Read more >
FP16 gives NaN loss when using pre-trained model
I tried the new fp16 in native torch. However, when I continue my model training for my segmentation task I get loss as...
Read more >
T5 fp16 issue is fixed - Transformers - Hugging Face Forums
We have just fixed the T5 fp16 issue for some of the T5 models! ... The rest of the models produce nan loss/logits....
Read more >
Pytorch mixed precision causing discriminator loss to go to ...
Pytorch mixed precision causing discriminator loss to go to NaN in WGAN-GP ... NOTE: The FP16 chart ends on step ~140 because it...
Read more >
Mixed precision training - fastai
Your activations or loss can overflow. The opposite problem from the gradients: it's easier to hit nan (or infinity) in FP16 precision, and...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found