Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

why the training loss always none?

See original GitHub issue

I got some loss like this:


100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 424/424 [04:10<00:00,  2.24it/s]
[train] Epoch: 22/100 Loss: nan Acc: 0.010870849580527
Execution time: 250.25667172999238

100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 108/108 [00:26<00:00,  5.16it/s]
[val] Epoch: 22/100 Loss: nan Acc: 0.011121408711770158
Execution time: 26.448329468010343

100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 424/424 [04:09<00:00,  2.23it/s]
[train] Epoch: 23/100 Loss: nan Acc: 0.010870849580527
Execution time: 249.90277546200377

100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 108/108 [00:26<00:00,  5.09it/s]
[val] Epoch: 23/100 Loss: nan Acc: 0.011121408711770158
Execution time: 26.87914375399123

100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 424/424 [04:09<00:00,  2.24it/s]
[train] Epoch: 24/100 Loss: nan Acc: 0.010870849580527
Execution time: 249.9237438449927

100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 108/108 [00:26<00:00,  5.16it/s]
[val] Epoch: 24/100 Loss: nan Acc: 0.011121408711770158
Execution time: 26.460865497996565

It;s all nan, for what reason maybe?

Issue Analytics

State:
Created 5 years ago
Reactions:1
Comments:14 (1 by maintainers)

Top GitHub Comments

3reactions

wave-transmittercommented, Feb 27, 2019

i also suffered from Loss:Nan… I reduce learning rate from 1e-3 to 1e-1, but results is same(Loss : nan).

If Loss is nan, then cannot store weights. so model cant increase accuracy… Anybody solved this problem?

Reducing learning rate means selecting a rate lower than 1e-3, such as 1e-5 or 0.5e-3. Personally I trained the model from scratch on UCF101 with learning rate equal to 1e-3, without having any NaN issues.

0reactions

ilovekjcommented, May 9, 2019

maybe we didn’t use pretrain model, but i am not sure