Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

self._normalize() got NAN...

See original GitHub issue

Bug description

Sir, when I print upsampled_a in upsampled_a = self._normalize(self.hook_a, self.hook_a.ndim - 2) I got NAN in some value. Is there any bug?

I set model to vgg19 target_layer1=‘features.35’ fc_layer1=‘classifier.6’

Code snippet to reproduce the bug

nan

Error traceback

nan

Environment

nan

Issue Analytics

State:
Created a year ago
Comments:7 (5 by maintainers)

Top GitHub Comments

1reaction

frgfmcommented, Sep 18, 2022

Hi @lars-nieradzik 👋

Thanks for the specifics, I managed to reproduce the bug. I think I identified the problem, so I opened a PR #185 which should fix this!

The problem was:

ScoreCAM is a bit specific and forwards modified input tensor (some with zero variance)
the normalization in scoreCams was performed inplace (hence the hook_a having NaNs because they’re the normalized version actually)
to avoid this in this specific case, I added an eps during the division part of the normalization

0reactions

lars-nieradzikcommented, Sep 17, 2022

I can confirm this NaN bug. It mostly occurs for ScoreCAM.

from torchvision.io.image import read_image
from torchvision.transforms.functional import normalize, resize, to_pil_image
import torch
from torchvision.models import resnet18
from torchcam.methods import *

model = resnet18(pretrained=True).eval()
cam_extractor = ScoreCAM(model)
# Get your input
img = read_image("border-collie.jpg")
# Preprocess it for your chosen model
input_tensor = normalize(resize(img, (224, 224)) / 255., [0.485, 0.456, 0.406], [0.229, 0.224, 0.225])

# Preprocess your data and feed it to the model
out = model(input_tensor.unsqueeze(0))
# Retrieve the CAM by passing the class index and the model output
activation_map = cam_extractor(out.squeeze(0).argmax().item(), out)

print(activation_map)

Error happens as follows:

core.py --> for weight, activation in zip(weights, self.hook_a): --> variable self.hook_a contains NaN
torch.nansum(weight * activation, dim=1) --> return a zero tensor
self._normalize(cam) --> divide by zero error (because minimum == maximum)

For other (model, CAM) combinations, numerical instabilities may also occur.

800 images: