question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Could there be a bug in mixed precision?

See original GitHub issue

When I use torch 1.6.0 & accelerate 0.3.0 and set mixed precision as yes in accelerate config, nothing happens (still full precision training). If I set in the code Accelerator(fp16=True) then amp is triggered, but the loss becomes inf right away.

But if I use the pytorch way (i.e. autocast in the code myself), the training is normal and amp is enabled.

So I wonder if there is a possible bug in accelerate.

My enviroment is single 2080 Ti, local machine. The code with this problem is here.

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:24 (9 by maintainers)

github_iconTop GitHub Comments

1reaction
sguggercommented, Aug 5, 2021

I was able to investigate this more and I think I found the problem. The PR above should fix the issue, would you mind giving it a try?

1reaction
sguggercommented, Aug 4, 2021

Thanks for the analysis and the example you provided. I’ll try to dig more into the differences tomorrow.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Train With Mixed Precision - NVIDIA Documentation Center
This technique is called mixed-precision training since it uses both single- and half-precision representations.
Read more >
Bug: Switching the spatial reference of a low-precision feature ...
Technical Article Details : Bug: Switching the spatial reference of a low-precision feature dataset may result in a mixed-precision feature dataset.
Read more >
Mixed precision policy API - Keras
A dtype policy for a Keras layer. A dtype policy determines a layer's computation and variable dtypes. Each layer has a policy. Policies...
Read more >
Mixed precision training with tf.keras on Weights & Biases
There are some configurations needed, however, in order to activate mixed precision training. We will see them in a later section.
Read more >
Training with mixed precision: loss is NaN despite finite output ...
Good to know, thanks! But, if softmax was already using float32, then why did manually casting it to float32 (and then back to...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found