question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

NaN-value attributions when using DeepLIFT during fp16 training

See original GitHub issue

Hi! I’m trying to compute attributions for BigBird via DeepLIFT during fp16 training, but I’m getting NaN for many of the attribution values. However, if I do fp32 training, none of the attribution values are NaN. I guess the fp16 precision is truncating some near-zero intermediate values to zero, resulting in division-by-zero. Would appreciate any ideas on how to work around this issue!

For reference, the inputs and baselines parameters for my DeepLIFT attribute function call are constructed as follows:

inputs

  • Feed input ids into BigBird word embedding layer to get input embeddings.
  • Use input embeddings as inputs for DeepLIFT.

baselines

  • Create baseline ids by setting all non-special tokens in the input ids to PAD.
  • Feed baseline ids into BigBird word embedding layer to get baseline embeddings.
  • Use baseline embeddings as baselines for DeepLIFT.

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:7 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
aarzchancommented, Feb 2, 2022

Hi @bilalsal, here is a Colab notebook demonstrating the error. I shared it with your FB email.

0reactions
aarzchancommented, Mar 9, 2022

Hi @bilalsal and @NarineK, thanks for the fix! I don’t have any further remarks/questions, so I’m closing the issue.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Train With Mixed Precision - NVIDIA Documentation Center
Mixed precision is the combined use of different numerical precisions in a computational method. Half precision (also known as FP16) data ...
Read more >
Robust Explainability: A tutorial on gradient-based attribution ...
Deep Learning Important FeaTures (DeepLIFT): DeepLIFT was designed to tackle the saturation problem using “reference activations”, calculated in the forward ...
Read more >
BFloat16: The secret to high performance on Cloud TPUs
Our hardware teams chose bfloat16 for Cloud TPUs to improve hardware efficiency while maintaining the ability to train accurate deep learning ...
Read more >
arXiv:1711.06104v4 [cs.LG] 7 Mar 2018
Backpropagation-based methods compute the attributions for all input features in a single forward and backward pass through the network 1.
Read more >
Understanding Mixed Precision Training | by Jonathan Davis
We introduce methodology for training deep neural networks using ... In mixed-precision training, FP16 is used instead to store the weights, ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found