NaN-value attributions when using DeepLIFT during fp16 training
See original GitHub issueHi! I’m trying to compute attributions for BigBird via DeepLIFT during fp16 training, but I’m getting NaN for many of the attribution values. However, if I do fp32 training, none of the attribution values are NaN. I guess the fp16 precision is truncating some near-zero intermediate values to zero, resulting in division-by-zero. Would appreciate any ideas on how to work around this issue!
For reference, the inputs
and baselines
parameters for my DeepLIFT attribute function call are constructed as follows:
inputs
- Feed input ids into BigBird word embedding layer to get input embeddings.
- Use input embeddings as inputs for DeepLIFT.
baselines
- Create baseline ids by setting all non-special tokens in the input ids to PAD.
- Feed baseline ids into BigBird word embedding layer to get baseline embeddings.
- Use baseline embeddings as baselines for DeepLIFT.
Issue Analytics
- State:
- Created 2 years ago
- Comments:7 (4 by maintainers)
Top Results From Across the Web
Train With Mixed Precision - NVIDIA Documentation Center
Mixed precision is the combined use of different numerical precisions in a computational method. Half precision (also known as FP16) data ...
Read more >Robust Explainability: A tutorial on gradient-based attribution ...
Deep Learning Important FeaTures (DeepLIFT): DeepLIFT was designed to tackle the saturation problem using “reference activations”, calculated in the forward ...
Read more >BFloat16: The secret to high performance on Cloud TPUs
Our hardware teams chose bfloat16 for Cloud TPUs to improve hardware efficiency while maintaining the ability to train accurate deep learning ...
Read more >arXiv:1711.06104v4 [cs.LG] 7 Mar 2018
Backpropagation-based methods compute the attributions for all input features in a single forward and backward pass through the network 1.
Read more >Understanding Mixed Precision Training | by Jonathan Davis
We introduce methodology for training deep neural networks using ... In mixed-precision training, FP16 is used instead to store the weights, ...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Hi @bilalsal, here is a Colab notebook demonstrating the error. I shared it with your FB email.
Hi @bilalsal and @NarineK, thanks for the fix! I don’t have any further remarks/questions, so I’m closing the issue.