Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Question: Integrated Gradient w/ Embedded Categorical Data

See original GitHub issue

Hi Everyone,

Question:

How can I apply integrated gradient to a dataset with numerical and embedded categorical data?

I am somewhat of a beginner with pytorch and the available resources are just not clicking with my use case. The ultimate goal is for me to plot the feature importance of a model, but I am stuck on calculating the attribution. Any help or guidance would be much appreciated.

What I’ve reviewed:

Multimodal_VQA_Captum_Insights tutorial
BERT tutorials
https://github.com/pytorch/captum/issues/282

(These resources all have very different data structures(images/sentences) and are confusing for a beginner to translate to an easier tabular numerical/categorical dataset)

My Problem:

Tutorial/Full Code Dataset

Model:

  (all_embeddings): ModuleList(
    (0): Embedding(3, 2)
    (1): Embedding(2, 1)
    (2): Embedding(2, 1)
    (3): Embedding(2, 1)
  )
  (embedding_dropout): Dropout(p=0.4, inplace=False)
  (batch_norm_num): BatchNorm1d(6, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (layers): Sequential(
    (0): Linear(in_features=11, out_features=200, bias=True)
    (1): ReLU(inplace=True)
    (2): BatchNorm1d(200, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (3): Dropout(p=0.4, inplace=False)
    (4): Linear(in_features=200, out_features=100, bias=True)
    (5): ReLU(inplace=True)
    (6): BatchNorm1d(100, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (7): Dropout(p=0.4, inplace=False)
    (8): Linear(in_features=100, out_features=50, bias=True)
    (9): ReLU(inplace=True)
    (10): BatchNorm1d(50, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (11): Dropout(p=0.4, inplace=False)
    (12): Linear(in_features=50, out_features=2, bias=True)
  )
)

Categorical Data Example:

tensor([[0, 0, 1, 1],
        [2, 0, 0, 1],
        [0, 0, 1, 0],
        [0, 0, 0, 0],
        [2, 0, 1, 1]])

Numerical Data Example

tensor([[6.1900e+02, 4.2000e+01, 2.0000e+00, 0.0000e+00, 1.0000e+00, 1.0135e+05],
        [6.0800e+02, 4.1000e+01, 1.0000e+00, 8.3808e+04, 1.0000e+00, 1.1254e+05],
        [5.0200e+02, 4.2000e+01, 8.0000e+00, 1.5966e+05, 3.0000e+00, 1.1393e+05],
        [6.9900e+02, 3.9000e+01, 1.0000e+00, 0.0000e+00, 2.0000e+00, 9.3827e+04],
        [8.5000e+02, 4.3000e+01, 2.0000e+00, 1.2551e+05, 1.0000e+00, 7.9084e+04]])

Output Data Example

tensor([1, 0, 1, 0, 0])

My Failing Attempt at Attribution

interpretable_embedding = configure_interpretable_embedding_layer(model, 'all_embeddings')

cat_input_embedding = interpretable_embedding.indices_to_embeddings(categorical_train_data).unsqueeze(0)
#I received an error here "NotImplementedError"


ig = IntegratedGradients(model)

ig_attr_train = ig.attribute(inputs=(numerical_train_data, categorical_train_data), baselines=(numerical_train_data * 0.0, cat_input_embedding), target=train_outputs, n_steps=50)

Issue Analytics

State:
Created 3 years ago
Reactions:2
Comments:9 (5 by maintainers)

Top GitHub Comments

1reaction

reggievickcommented, Aug 19, 2020

Awesome, that is much cleaner. I was planning on refactoring once i understood it, but you’ve nailed it here. Thanks so so much @NarineK!

1reaction

NarineKcommented, Aug 19, 2020

yeah, I think we can clean things up and make more modular with something like this:

class CombinedEmbedding(nn.Module):
    def __init__(self, embedding_size):
        super().__init__()
        self.all_embeddings = nn.ModuleList([nn.Embedding(ni, nf) for ni, nf in embedding_size])
    
    def forward(self, x_categorical):
        embeddings = []
        for i,e in enumerate(self.all_embeddings):
            print(e(x_categorical[:,i]).shape)
            embeddings.append(e(x_categorical[:,i])) 
        x = torch.cat(embeddings, 1)    
        return x
        
class Model(nn.Module):

    def __init__(self, embedding_size, num_numerical_cols, output_size, layers, p=0.4):
        super().__init__()
        #self.all_embeddings = nn.ModuleList([nn.Embedding(ni, nf) for ni, nf in embedding_size])
        self.all_embedding = CombinedEmbedding(embedding_size)
        
        self.embedding_dropout = nn.Dropout(p)
        self.batch_norm_num = nn.BatchNorm1d(num_numerical_cols)

        all_layers = []
        num_categorical_cols = sum((nf for ni, nf in embedding_size))
        input_size = num_categorical_cols + num_numerical_cols

        for i in layers:
            all_layers.append(nn.Linear(input_size, i))
            all_layers.append(nn.ReLU(inplace=True))
            all_layers.append(nn.BatchNorm1d(i))
            all_layers.append(nn.Dropout(p))
            input_size = i

        all_layers.append(nn.Linear(layers[-1], output_size))

        self.layers = nn.Sequential(*all_layers)

    def forward(self, x_categorical, x_numerical):
        x = self.all_embedding(x_categorical)
        x = self.embedding_dropout(x)

        x_numerical = self.batch_norm_num(x_numerical)
        x = torch.cat([x, x_numerical], 1)
        x = self.layers(x)
        return x

Here is all you need for interpretability:


from captum.attr import IntegratedGradients
from captum.attr import configure_interpretable_embedding_layer, remove_interpretable_embedding_layer

interpretable_embedding = configure_interpretable_embedding_layer(model, 'all_embedding')

emb = interpretable_embedding.indices_to_embeddings(categorical_test_data)


ig = IntegratedGradients(model)
ig.attribute((emb, numerical_test_data), target=0)


remove_interpretable_embedding_layer(model, interpretable_embedding)

I didn’t specify baselines. Feel free to specify it too.

Top Results From Across the Web

The gradient of neural networks w.r.t one-hot encoded inputs

Suppose we trained a neural network f(x) with x one-hot encoded. Now I want to evaluate the importance of each character based on...

Tensorflow 2.0 Tutorial on Categorical Features Embedding

A comprehensive guide to categorical features embedding using Tensorflow 2.0 and a practical demo on how to train a neural network with it....

Survey on categorical data for neural networks - Gale

This survey investigates current techniques for representing qualitative data for use as input to neural networks. Techniques for using qualitative data in ...

Entity Embeddings of Categorical Variables - arXiv Vanity

In this paper we show how to use the entity embedding method to automatically learn the representation of categorical features in multi- ...

Categorical Embedding and Transfer Learning

The words/tokens of any language are categorical variables. Machine Learning algorithms are devoted to working with numbers so we have to ...