Multimodal input binary classifier with Saliency
See original GitHub issue❓ Questions and Help
Hi Everyone,
Question:
How can I apply saliency to a dataset composed of categorical and image data?
I am somewhat of a beginner with pytorch and the available resources are just not clicking with my use case. The ultimate goal is for me to plot the saliency of a model, but I am stuck on calculating the gradient. Any help or guidance would be much appreciated.
What I’ve reviewed:
Multimodal_VQA_Captum_Insights tutorial BERT tutorials (These resources all have very different data structures(images/sentences) and are confusing for a beginner to translate to an easier image/categorical dataset)
My issue
Model
Model(
(label_embedding): Embedding(10, 10)
(model): Sequential(
(0): Linear(in_features=1034, out_features=512, bias=True)
(1): LeakyReLU(negative_slope=0.2, inplace=True)
(2): Linear(in_features=512, out_features=512, bias=True)
(3): Dropout(p=0.4, inplace=False)
(4): LeakyReLU(negative_slope=0.2, inplace=True)
(5): Linear(in_features=512, out_features=512, bias=True)
(6): Dropout(p=0.4, inplace=False)
(7): LeakyReLU(negative_slope=0.2, inplace=True)
(8): Linear(in_features=512, out_features=1, bias=True)
)
)
Categorical
The categorical data is just a normal label such as 4
What I tried
saliency = Saliency(model)
grads = saliency.attribute((input, label), target=None)
where input is just a (1, 32, 32) image. I set target=None since it’s a binary classifier.
Failure output
One of the differentiated Tensors does not require grad
Since the label is not a float it does not require grad, is there a way i can use the saliency method to capture the grads?
Issue Analytics
- State:
- Created 2 years ago
- Comments:17 (10 by maintainers)

Top Related StackOverflow Question
I am happy to create a PR for it.
Thanks for the good suggestion, @nanohanno!
Do you want to create a PR for this? I can gladly help getting this landed. I can also update the message myself if needed.