question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Problem with inputs using Integrated Gradients and model with embedding layers

See original GitHub issue

I’m wondering how to use the Integrated Gradients with the following model as it has the embedding layers?

class TabularModel(nn.Module):
    def __init__(self, embedding_sizes, n_cont):
        super().__init__()
        self.embeddings = nn.ModuleList([nn.Embedding(categories, size) for categories, size in embedding_sizes])
        n_emb = sum(e.embedding_dim for e in self.embeddings)
        self.n_emb, self.n_cont = n_emb, n_cont
        self.lin1 = nn.Linear(self.n_emb + self.n_cont, 100)
        self.lin2 = nn.Linear(100, 50)
        self.lin3 = nn.Linear(50, 5)
        self.bn1 = nn.BatchNorm1d(self.n_cont)
        self.bn2 = nn.BatchNorm1d(100)
        self.bn3 = nn.BatchNorm1d(50)
        self.emb_drop = nn.Dropout(0.2)
        self.drops = nn.Dropout(0.1)

    def forward(self, x_cat, x_cont):
        x = [e(x_cat[:, i]) for i, e in enumerate(self.embeddings)]
        x = torch.cat(x, 1)
        x = self.emb_drop(x)
        x2 = self.bn1(x_cont)
        x = torch.cat([x, x2], 1)
        x = F.relu(self.lin1(x))
        x = self.drops(x)
        x = self.bn2(x)
        x = F.relu(self.lin2(x))
        x = self.drops(x)
        x = self.bn3(x)
        x = self.lin3(x)
        return x

The main issue is in passing the inputs to

ig = IntegratedGradients(model)
ig.attribute(inputs)

after which I get the error:

AssertionError: Baseline can be provided as a tensor for just one input and broadcasted to the batch or input and baseline must have the same shape or the baseline corresponding to each input tensor must be a scalar. Found baseline: tensor([[8.0000e+00, 1.0138e+03, 8.2027e+01,  ..., 1.4000e+01, 0.0000e+00,
         0.0000e+00],
        [8.0000e+00, 1.0161e+03, 8.7000e+01,  ..., 6.6700e+01, 0.0000e+00,
         0.0000e+00],
        [1.0000e+00, 1.0226e+03, 4.8000e+01,  ..., 2.4700e+01, 0.0000e+00,
         0.0000e+00],
        ...,
        [0.0000e+00, 1.0208e+03, 8.2000e+01,  ..., 1.2400e+01, 0.0000e+00,
         0.0000e+00],
        [7.0000e+00, 1.0142e+03, 9.8000e+01,  ..., 1.1000e+00, 1.0000e+00,
         0.0000e+00],
        [0.0000e+00, 1.0230e+03, 7.6000e+01,  ..., 3.6900e+01, 0.0000e+00,
         0.0000e+00]]) and input: tensor([[ 4,  8,  0,  3],
        [ 5,  8,  1, 15],
        [ 2, 13,  0, 29],
        ...,
        [ 5,  1,  0, 21],
        [ 0, 23,  0,  5],
        [ 5,  5,  4, 11]])

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:17 (6 by maintainers)

github_iconTop GitHub Comments

1reaction
karppmikcommented, Aug 26, 2020

Hi @vivekmig, thanks a lot for the reply! I was able to do the attributions with that approach. How would I go about visualising the results?

Referring to this tutorial, https://captum.ai/tutorials/House_Prices_Regression_Interpret, I don’t quite understand what the following lines do:

ig_attr_test_sum = ig_attr_test.detach().numpy().sum(0)
ig_attr_test_norm_sum = ig_attr_test_sum / np.linalg.norm(ig_attr_test_sum, ord=1)

In my case, ig_attr_test = ig.attribute(inputs=(embeddings_cat, test_set.features_cont), target=0) returns a tuple, and I’m not sure how to deal with that.

0reactions
JavierPerez21commented, Jun 15, 2021

I am experiencing the same problem when trying to generate the attributions of the PyTorch ResNet model from https://pytorch.org/vision/stable/models.html. I have been trying to figure out if the forward function from: https://github.com/pytorch/vision/blob/master/torchvision/models/resnet.py needs to be changed but I think that the gradients can be computed for all its operations. Here’s the code for the ResNet class:

class ResNet(nn.Module):

    def __init__(
        self,
        block: Type[Union[BasicBlock, Bottleneck]],
        layers: List[int],
        num_classes: int = 1000,
        zero_init_residual: bool = False,
        groups: int = 1,
        width_per_group: int = 64,
        replace_stride_with_dilation: Optional[List[bool]] = None,
        norm_layer: Optional[Callable[..., nn.Module]] = None
    ) -> None:
        super(ResNet, self).__init__()
        if norm_layer is None:
            norm_layer = nn.BatchNorm2d
        self._norm_layer = norm_layer

        self.inplanes = 64
        self.dilation = 1
        if replace_stride_with_dilation is None:
            # each element in the tuple indicates if we should replace
            # the 2x2 stride with a dilated convolution instead
            replace_stride_with_dilation = [False, False, False]
        if len(replace_stride_with_dilation) != 3:
            raise ValueError("replace_stride_with_dilation should be None "
                             "or a 3-element tuple, got {}".format(replace_stride_with_dilation))
        self.groups = groups
        self.base_width = width_per_group
        self.conv1 = nn.Conv2d(3, self.inplanes, kernel_size=7, stride=2, padding=3,
                               bias=False)
        self.bn1 = norm_layer(self.inplanes)
        self.relu = nn.ReLU(inplace=True)
        self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
        self.layer1 = self._make_layer(block, 64, layers[0])
        self.layer2 = self._make_layer(block, 128, layers[1], stride=2,
                                       dilate=replace_stride_with_dilation[0])
        self.layer3 = self._make_layer(block, 256, layers[2], stride=2,
                                       dilate=replace_stride_with_dilation[1])
        self.layer4 = self._make_layer(block, 512, layers[3], stride=2,
                                       dilate=replace_stride_with_dilation[2])
        self.avgpool = nn.AdaptiveAvgPool2d((1, 1))
        self.fc = nn.Linear(512 * block.expansion, num_classes)

        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
            elif isinstance(m, (nn.BatchNorm2d, nn.GroupNorm)):
                nn.init.constant_(m.weight, 1)
                nn.init.constant_(m.bias, 0)

        # Zero-initialize the last BN in each residual branch,
        # so that the residual branch starts with zeros, and each residual block behaves like an identity.
        # This improves the model by 0.2~0.3% according to https://arxiv.org/abs/1706.02677
        if zero_init_residual:
            for m in self.modules():
                if isinstance(m, Bottleneck):
                    nn.init.constant_(m.bn3.weight, 0)  # type: ignore[arg-type]
                elif isinstance(m, BasicBlock):
                    nn.init.constant_(m.bn2.weight, 0)  # type: ignore[arg-type]

    def _make_layer(self, block: Type[Union[BasicBlock, Bottleneck]], planes: int, blocks: int,
                    stride: int = 1, dilate: bool = False) -> nn.Sequential:
        norm_layer = self._norm_layer
        downsample = None
        previous_dilation = self.dilation
        if dilate:
            self.dilation *= stride
            stride = 1
        if stride != 1 or self.inplanes != planes * block.expansion:
            downsample = nn.Sequential(
                conv1x1(self.inplanes, planes * block.expansion, stride),
                norm_layer(planes * block.expansion),
            )

        layers = []
        layers.append(block(self.inplanes, planes, stride, downsample, self.groups,
                            self.base_width, previous_dilation, norm_layer))
        self.inplanes = planes * block.expansion
        for _ in range(1, blocks):
            layers.append(block(self.inplanes, planes, groups=self.groups,
                                base_width=self.base_width, dilation=self.dilation,
                                norm_layer=norm_layer))

        return nn.Sequential(*layers)

    def _forward_impl(self, x: Tensor) -> Tensor:
        # See note [TorchScript super()]
        x = self.conv1(x)
        x = self.bn1(x)
        x = self.relu(x)
        x = self.maxpool(x)

        x = self.layer1(x)
        x = self.layer2(x)
        x = self.layer3(x)
        x = self.layer4(x)

        x = self.avgpool(x)
        x = torch.flatten(x, 1)
        x = self.fc(x)

        return x

    def forward(self, x: Tensor) -> Tensor:
        return self._forward_impl(x)

and I am trying to calculate the attributions with:

def attribute(attr_method, x, y, model, **kwargs):
    model.zero_grad()
    tensor_attributions = attr_method.attribute(inputs=x, target=y, **kwargs)
    return tensor_attributions

lig = ca.IntegratedGradients(nominal_model)
attributions = attribute(ig, x, y, nominal_model, baselines=toch.zeros_like(x), n_steps=20

which returns:

/usr/local/lib/python3.7/dist-packages/captum/_utils/common.py in _validate_input(inputs, baselines, draw_baseline_from_distrib)
                      " same shape or the baseline corresponding to each input tensor"
                      " must be a scalar. Found baseline: {} and input: {}".format(
                            baseline, input
                    )
                 )

AssertionError: Baseline can be provided as a tensor for just one input and broadcasted to the batch or input and baseline must have the same shape or the baseline corresponding to each input tensor must be a scalar.

Any ideas on how to fix this?

Read more comments on GitHub >

github_iconTop Results From Across the Web

Interpreting the Prediction of BERT Model for Text Classification
When we want to use integrated gradients to interpret a model's prediction, we need to specify two things: the model's output and the...
Read more >
Integrated gradients | TensorFlow Core
IG aims to explain the relationship between a model's predictions in terms of its features. It has many use cases including understanding feature...
Read more >
Explainable AI: Integrated Gradients for Deep Neural Network ...
For the Integrated Gradients, we only need the Embedding Layer, i.e. the Transformer Block. Now we can define the algorithm and its hyperparameters....
Read more >
Integrated-gradient on IMDB dataset (PyTorch)
Integrated -gradient on IMDB dataset (PyTorch) · model : The model to explain, whose type is tf. · embedding_layer : The...
Read more >
Integrated Gradients - Captum
Second, let's apply integrated gradients on the toy model's output layer using sample data. The code snippet below computes the attribution of output...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found