question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Get nan loss during training

See original GitHub issue

❓ Questions and Help

I am training a model modified from maskrcnn-benchmark. When the model is trained using single GPU , the loss is correctly, but when trained using 4 gpus the model is quite easy get nan. How to solve this problem?

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:5 (1 by maintainers)

github_iconTop GitHub Comments

1reaction
yuleichincommented, Jun 14, 2019

I met the same problem. Trained with 4 GPU, the loss is nan from the first few iterations.

0reactions
aurouacommented, Oct 17, 2019

This problem is solved by set the parameter WARMUP_ITERS to 1000.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Deep-Learning Nan loss reasons - python - Stack Overflow
You may have an issue with the input data. Try calling assert not np.any(np.isnan(x)) on the input data to make sure you are...
Read more >
Common Causes of NANs During Training
Common Causes of NANs During Training · Gradient blow up · Bad learning rate policy and params · Faulty Loss function · Faulty...
Read more >
Getting NaN for loss - General Discussion - TensorFlow Forum
You transform X_train but pass X_train_A and X_train_B into the model, which were never transformed by the scaler and contain negative values.
Read more >
Debugging a Machine Learning model written in TensorFlow ...
In this article, you get to look over my shoulder as I go about debugging a ... a model that doesn't train, there...
Read more >
Keras Sequential model returns loss 'nan'
@lcrmorin I'm pretty sure that my dataset doesn't contain nan elements. However, I notice that the loss turn to nan when I changed...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found