Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Transformer models above version 4.1.1 can't be wrapped with the configure_interpretable_embedding_layer

See original GitHub issue

🐛 Bug

When using the latest version of the transformers package, the models cannot be wrapped with the configure_interpretable_embedding_layer function. There is a line in the new package version that gets the shape of the input the following way : batch_size, seq_length = input_shape, but the input we pass has three dimensions as we pass the embedding. So we get the ‘too many values to unpack (expected 2)’ error. Is it possible to solve this problem with the latest version of the package?

To Reproduce

You can see the issue on this Colab notebook:

https://colab.research.google.com/drive/1-aZ9-Kzkb_BVb-8vcvHBAAYy2iBk0khV?hl=en#scrollTo=ZC9tY3vsuBjL&uniqifier=2

If you uncomment the !pip install transformers==4.1.1 line, which installs a lower version, the error is not present.

Issue Analytics

State:
Created 3 years ago
Reactions:2
Comments:7 (5 by maintainers)

Top GitHub Comments

2reactions

apepacommented, Feb 4, 2021

Thank you @NarineK for the suggestion. I don’t use Integrated Gradients, but showcase other approaches like Saliency and InputXGradient in that tutorial.

I just tried the previous idea using the embeddings parameter and it worked. I did the following:

Create a wrapper for the BERT model to pass embeddings instead of the ids (this could also be done with a function wrapper):

class BertWrapper(torch.nn.Module):
    def __init__(self, transformer):
        super(BertWrapper, self).__init__()
        self.transformer = transformer
    def forward(self, input, attention_mask=None):
        return self.transformer(inputs_embeds=input, attention_mask=attention_mask)['logits']

Extract the BERT embeddings for the token ids:

input_embeddings = model.transformer.bert.embeddings(input_ids)

Pass the embeddings as input instead of the tokens:

ablator = Saliency(model)
additional_args = input_ != tokenizer.pad_token_id
attributions = ablator.attribute(input_embeddings, additional_forward_args=(additional_args,), target=0)

More importantly, this doesn’t need the configure_interpretable_embedding_layer/remove_interpretable_embedding_layer functionality and would not even work if you accidentally use it (which was my mistake last time I tried).

I would even recommend using this approach whenever possible – having the possibility of passing the embeddings or the token_ids to a model instead of the configure_interpretable_embedding_layer/remove_interpretable_embedding_layer functionality, as there’s more control on the user side.

1reaction

NarineKcommented, Jan 30, 2021

It would be good to update it. There is a case when I look into token, word and position embeddings separately with the configure layer. We might be able to do that with multi-layer IG right now + layer conductance case would needs a fix too.

Top Results From Across the Web

LayoutLM — transformers 4.1.1 documentation

In this paper, we propose the textbf{LayoutLM} to jointly model interactions between text and layout information across scanned document images, which is ...

Source code for captum.attr._models.base

To do so, we separate embedding layers from the model, ... this will ensure that we can execute the forward pass using interpretable...

Bilingual is At Least Monolingual (BALM):

edge about a language by encoding sentences as fixed-length embeddings. ... English transformer models have over 110 million parameters – the size of...

Utilizing Transformer Representations Efficiently

Preliminary Setup. Import all required libraries and import modules/utilities. We'll be working with the in this notebook to visualize various embedding layers.

Multi-dimensional patient acuity estimation with ...

Transformer model architectures have revolutionized the natural language processing (NLP) domain and continue to produce state-of-the-art results in ...