Decoder AMP training loss blow up
See original GitHub issueJust pin my training issue:
when set amp = True
, the decoder’s loss(unet1 & unet2) blowed up to 1K+. [loss <0.01 when amp declined]
Just curious, does the half-precision training mode harm the DDPM training?
as a ref: https://wandb.ai/hcaoaf/dalle2-decoder/runs/3sb0pumh?workspace=user-hcaoaf
Issue Analytics
- State:
- Created a year ago
- Reactions:1
- Comments:8 (4 by maintainers)
Top Results From Across the Web
What can be the cause of a sudden explosion in the loss when ...
The first 50k steps of the training the loss is quite stable and low, and suddenly it starts to exponentially explode. I wonder...
Read more >loss explodes after few iterations · Issue #3868 - GitHub
I recheck csv files which created . I am training on only one class but gradient is exploding after few iterations exponentially please...
Read more >Exploding loss in encoder/decoder model - PyTorch Forums
I'm trying to build a text to speech model in PyTorch using an encoder/decoder architecture on librispeech 100hr dataset.
Read more >9 Tips For Training Lightning-Fast Neural Networks In Pytorch
The amp package will take care of most things for you. It'll even scale the loss if the gradients explode or go to...
Read more >Exploding loss in encoder/decoder model PyTorch
You are calculating your loss only on the last mini batch, you should accumulate your loss and do backward on the accumulated loss...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
@lucidrains wow cool built-in grad-accum support. Now, everything works fine (
max-batch-size
:32,total-batch-size
:265,amp
:True)here’s w/o AMP.
@lucidrains let me try the current version