question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Adaptive Pixel Intensity Loss generated NaN values while training

See original GitHub issue

Was training on custom human dataset. Batch Size = 8 No of training images = 3800

No of steps trained before showing error = 75

After 75th step It generated an error:

RuntimeError: Function 'UpsampleBilinear2DBackward1' returned nan values in its 0th output.

The model trained successfully when using BCE loss.

We even checked for NaN values using torch.autograd.set_detect_anamoly(True) But it returned False stating that no NaN values were found

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:10 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
Karel911commented, Jan 28, 2022

@Karel911 can you help me with removing the edge generation parts? because i am facing a similar issue.

I also curious about which parts make this issue. I released the version of TRACER without edge generation. Replace the released scripts with the existing ones. I briefly tested it so if there is any problem, please let me know.

Thanks.

0reactions
hackkhaicommented, Jan 28, 2022

Thanks, Let me check this out

Read more comments on GitHub >

github_iconTop Results From Across the Web

Common Causes of NANs During Training
Common Causes of NANs During Training · Gradient blow up · Bad learning rate policy and params · Faulty Loss function · Faulty...
Read more >
NaN loss when training regression network - Stack Overflow
In my case, I use the log value of density estimation as an input. The absolute value could be very huge, which may...
Read more >
What should I do when my neural network doesn't learn?
NA or NaN or Inf values in your data creating NA or NaN or Inf values in the output, and therefore in the...
Read more >
NAN loss for regression while training · Issue #2134 - GitHub
I'm running a regression model on patches of size 32x32 extracted from images against a real value as the target value.
Read more >
Debugging a Machine Learning model written in TensorFlow ...
I wrote up a convnet model borrowing liberally from the training loop of the ... NaN loss. Now, when I ran it though,...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found