Erroneous unused differentiated Tensor when modifying return of detectron2 Faster RCNN
See original GitHub issue🐛 Bug
I was trying to apply Integrated Gradients to the detectron2 implementation of a COCO pre-trained Faster RCNN (specifically COCO-Detection/faster_rcnn_R_50_FPN_3x.yaml). In order to get a tensor[K] of all the predicted probabilities for K classes, I sub-classed the appropriate classes and changed to output to the N rows of K predicted probabilities instead of N Instances, then proceeded to sum the N rows in my wrapper function.
However, running ig.attributes gives me
Traceback (most recent call last):
File "main.py", line 84, in <module>
return_convergence_delta=True)
File "/usr/local/lib/python3.6/dist-packages/captum/log/__init__.py", line 35, in wrapper
return func(*args, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/captum/attr/_core/integrated_gradients.py", line 291, in attribute
method=method,
File "/usr/local/lib/python3.6/dist-packages/captum/attr/_core/integrated_gradients.py", line 354, in _attribute
additional_forward_args=input_additional_args,
File "/usr/local/lib/python3.6/dist-packages/captum/_utils/gradient.py", line 118, in compute_gradients
grads = torch.autograd.grad(torch.unbind(outputs), inputs)
File "/usr/local/lib/python3.6/dist-packages/torch/autograd/__init__.py", line 225, in grad
inputs, allow_unused, accumulate_grad=False)
RuntimeError: One of the differentiated Tensors appears to not have been used in the graph. Set allow_unused=True if this is the desired behavior.
To Reproduce
Steps to reproduce the behavior:
- Minimal example can be found here: https://github.com/mreso/captum-detectron2/tree/simplifications
- Follow the README to build Docker image and run
main.py
Expected behavior
ig.attributes runs and returns attributions, delta as expected
Environment
Describe the environment used for Captum
Used the nvidia/cuda:10.1-devel-ubuntu18.04
image, captum==0.4.1, torch==1.8.1+cu101, torchvision==0.9.1+cu101
Dockerfile and requirements.txt in repo above
Additional Information
I’ve looked at the following issues and have verified that my issue does not overlap
https://github.com/pytorch/captum/issues/733 - input is a 3-channel image and nothing else, CNN should be using all of it
https://github.com/pytorch/captum/issues/303 - return tensor is created through .clone()
and slicing, have tried .detach()
on all other tensors to no avail
https://github.com/pytorch/captum/issues/139 - the abstractions around detectron2 have been resolved through subclassing
Issue Analytics
- State:
- Created 2 years ago
- Comments:13 (6 by maintainers)
Ah yes, I think it’s this line https://github.com/reedless/captum-detectron2/blob/main/modified_image_list.py#L61
For IG, do you mean that when the algorithm passes the default 50 inputs on the first forward pass, the next layers of the model should keep the predictions for the 50 inputs entirely separate (i.e. 50 x 1000 x 81) rather than merge them together (i.e. 50000 x 81)?
Hi @reedless, I debugged this case with LayerGradientsXActivation. I computed the gradients w.r.t. backbone layer. When I compute the gradients w.r.t. layer inputs to backbone then it works. This shows that the issue lies in image preprocessing
I think that probably somewhere here it is creating new tensors: https://github.com/mreso/captum-detectron2/blob/simplifications/main.py#L41
In terms of specifically, IG, we need to think that this model architecture will work because IG is scaling the inputs by number of steps. When I was scaling the inputs, I noticed that that scaling isn’t passed down to further layers. I think it compresses the inputs when it is creating the boxes. You can, for example, try simple saliency, occlusion or feature ablation instead of IG. For IG we need to debug further and find out if the input scaling is being passed down to next layers of the model.