question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[Feature] Terminate on NaN handler

See original GitHub issue

What do you think about to add a handler that stops the training when Nan is encountered in the loss output? It is not that complicated to check with a custom handler but as such handlers can be found in other popular frameworks, maybe it worse to add something similar in ignite ?

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Comments:6 (2 by maintainers)

github_iconTop GitHub Comments

1reaction
vfdev-5commented, Jul 26, 2018

A solution can be proposed using anomaly detection introduced in pytorch 0.4.1

0reactions
vfdev-5commented, Jul 24, 2018

As discussed in slack, probably a correct behaviour in a situation with NaN/Inf loss is too stop the training:

NaN+anything=NaN, so you’d just continue getting NaN…for it to be useful you need a mechanism to roll-back to a previous value which was not NaN

Read more comments on GitHub >

github_iconTop Results From Across the Web

tf.keras.callbacks.TerminateOnNaN | TensorFlow v2.11.0
Callback that terminates training when a NaN loss is encountered.
Read more >
Tag Archives: terminate on Nan - TheAILearner
Callback can terminate a training when a Nan loss occurs. Callback can save the model after every epoch, also you can save the...
Read more >
Keras / NN - Handling NaN, missing input - Stack Overflow
My NN was not training so I added a piece of code to remove them from my set, but now I have some...
Read more >
Check 0.15.2: 4 Advanced Features
Compares two null-terminated char * string values, using the strcmp() function internally, and displays predefined message with condition and input ...
Read more >
Exceptions and Exception Handling
FEX_ABORT mode causes the program to call abort (3c) when the exception occurs. FEX_SIGNAL installs the handling function specified by the handler argument...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found