Dev Observability
Product
Pricing
Docs
Resources
Blog
Company
Debug Wordle

question-mark

Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Decoder AMP training loss blow up

See original GitHub issue

Just pin my training issue:

when set amp = True, the decoder’s loss(unet1 & unet2) blowed up to 1K+. [loss <0.01 when amp declined] Just curious, does the half-precision training mode harm the DDPM training?

as a ref: https://wandb.ai/hcaoaf/dalle2-decoder/runs/3sb0pumh?workspace=user-hcaoaf

Issue Analytics

State:
Created a year ago
Reactions:1
Comments:8 (4 by maintainers)

Top GitHub Comments

1reaction

CiaoHecommented, May 15, 2022

@lucidrains wow cool built-in grad-accum support. Now, everything works fine (max-batch-size:32, total-batch-size:265, amp:True)

1reaction

CiaoHecommented, May 15, 2022

@CiaoHe how does it look without AMP? i also built in an unconditional feature for the decoder, so one can train it without cross attention conditioning (im guessing it is probably the attention blocks blowing up, so i added some additional normalization that should help)

here’s w/o AMP.

@lucidrains let me try the current version

Read more comments on GitHub >

Top Results From Across the Web

What can be the cause of a sudden explosion in the loss when ...

The first 50k steps of the training the loss is quite stable and low, and suddenly it starts to exponentially explode. I wonder...

loss explodes after few iterations · Issue #3868 - GitHub

I recheck csv files which created . I am training on only one class but gradient is exploding after few iterations exponentially please...

Exploding loss in encoder/decoder model - PyTorch Forums

I'm trying to build a text to speech model in PyTorch using an encoder/decoder architecture on librispeech 100hr dataset.

9 Tips For Training Lightning-Fast Neural Networks In Pytorch

The amp package will take care of most things for you. It'll even scale the loss if the gradients explode or go to...

Exploding loss in encoder/decoder model PyTorch

You are calculating your loss only on the last mini batch, you should accumulate your loss and do backward on the accumulated loss...

Top Related Medium Post

No results found

Top Related StackOverflow Question

No results found

Troubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.

Top Related Reddit Thread

No results found

Top Related Hackernoon Post

No results found

Top Related Tweet

No results found

Top Related Dev.to Post

No results found

Top Related Hashnode Post

No results found

'x' arg in prior trainer

Why need unnormalize img when apply openai's clip `embed_image`