Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

NaN-value attributions when using DeepLIFT during fp16 training

See original GitHub issue

Hi! I’m trying to compute attributions for BigBird via DeepLIFT during fp16 training, but I’m getting NaN for many of the attribution values. However, if I do fp32 training, none of the attribution values are NaN. I guess the fp16 precision is truncating some near-zero intermediate values to zero, resulting in division-by-zero. Would appreciate any ideas on how to work around this issue!

For reference, the inputs and baselines parameters for my DeepLIFT attribute function call are constructed as follows:

inputs

Feed input ids into BigBird word embedding layer to get input embeddings.
Use input embeddings as inputs for DeepLIFT.

baselines

Create baseline ids by setting all non-special tokens in the input ids to PAD.
Feed baseline ids into BigBird word embedding layer to get baseline embeddings.
Use baseline embeddings as baselines for DeepLIFT.

Issue Analytics

State:
Created 2 years ago
Comments:7 (4 by maintainers)

Top GitHub Comments

1reaction

aarzchancommented, Feb 2, 2022

Hi @bilalsal, here is a Colab notebook demonstrating the error. I shared it with your FB email.

0reactions

aarzchancommented, Mar 9, 2022

Hi @bilalsal and @NarineK, thanks for the fix! I don’t have any further remarks/questions, so I’m closing the issue.

Top Results From Across the Web

Train With Mixed Precision - NVIDIA Documentation Center

Mixed precision is the combined use of different numerical precisions in a computational method. Half precision (also known as FP16) data ...

Robust Explainability: A tutorial on gradient-based attribution ...

Deep Learning Important FeaTures (DeepLIFT): DeepLIFT was designed to tackle the saturation problem using “reference activations”, calculated in the forward ...

BFloat16: The secret to high performance on Cloud TPUs

Our hardware teams chose bfloat16 for Cloud TPUs to improve hardware efficiency while maintaining the ability to train accurate deep learning ...

arXiv:1711.06104v4 [cs.LG] 7 Mar 2018

Backpropagation-based methods compute the attributions for all input features in a single forward and backward pass through the network 1.

Understanding Mixed Precision Training | by Jonathan Davis

We introduce methodology for training deep neural networks using ... In mixed-precision training, FP16 is used instead to store the weights, ...