question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

APEX Gradient overflow

See original GitHub issue

When I use the O1 train the swin-net, but Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 131072.0 [2021-04-25 05:35:19 swin_base_patch4_window7_224](main_prune.py 310): INFO Train: [0/300][4050/5004] eta 0:17:06 lr 0.000500 time 1.1737 (1.0765) loss 3.2572 (3.3279) grad_norm 1.0323 (nan) mem 4814MB

Is this normal?

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:5 (2 by maintainers)

github_iconTop GitHub Comments

1reaction
zeliu98commented, Apr 26, 2021

Hi, there is no need to worry, it’s normal, since apex use a dynamic loss scale. You can ref to the Doc for more detail.

0reactions
zeliu98commented, Apr 27, 2021

Hi, please refer to this issue #29 for the training log.

Read more comments on GitHub >

github_iconTop Results From Across the Web

How to handle gradient overflow when training a deep model ...
Hi, it is ok to train the model with fp32, but we would like to take advantage of the speed of mixed precision....
Read more >
Apex Loss Scale not stopping - PyTorch Forums
If the gradients overflow due to loss scaling, the scaling value will be lowered and you will see this message.
Read more >
gradient overflow - Apex使用教程与梯度爆炸问题
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 131072.0.
Read more >
nvidia apex Gradient overflow. Skipping step, loss scaler 0 ...
nvidia apex Gradient overflow. ... 混合精度计算(Mixed Precision),并介绍一款Nvidia开发的基于PyTorch的混合精度训练加速神器--Apex,.
Read more >
docs_aicloud/torch_amp_example at master
Skipping step, loss scaler 0 reducing loss scale to 0.5 Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 0.25 Torch...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found