question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Transformer models above version 4.1.1 can't be wrapped with the configure_interpretable_embedding_layer

See original GitHub issue

🐛 Bug

When using the latest version of the transformers package, the models cannot be wrapped with the configure_interpretable_embedding_layer function. There is a line in the new package version that gets the shape of the input the following way : batch_size, seq_length = input_shape, but the input we pass has three dimensions as we pass the embedding. So we get the ‘too many values to unpack (expected 2)’ error. Is it possible to solve this problem with the latest version of the package?

To Reproduce

You can see the issue on this Colab notebook:

https://colab.research.google.com/drive/1-aZ9-Kzkb_BVb-8vcvHBAAYy2iBk0khV?hl=en#scrollTo=ZC9tY3vsuBjL&uniqifier=2

If you uncomment the !pip install transformers==4.1.1 line, which installs a lower version, the error is not present.

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Reactions:2
  • Comments:7 (5 by maintainers)

github_iconTop GitHub Comments

2reactions
apepacommented, Feb 4, 2021

Thank you @NarineK for the suggestion. I don’t use Integrated Gradients, but showcase other approaches like Saliency and InputXGradient in that tutorial.

I just tried the previous idea using the embeddings parameter and it worked. I did the following:

  1. Create a wrapper for the BERT model to pass embeddings instead of the ids (this could also be done with a function wrapper):
class BertWrapper(torch.nn.Module):
    def __init__(self, transformer):
        super(BertWrapper, self).__init__()
        self.transformer = transformer
    def forward(self, input, attention_mask=None):
        return self.transformer(inputs_embeds=input, attention_mask=attention_mask)['logits']
  1. Extract the BERT embeddings for the token ids:
input_embeddings = model.transformer.bert.embeddings(input_ids)
  1. Pass the embeddings as input instead of the tokens:
ablator = Saliency(model)
additional_args = input_ != tokenizer.pad_token_id
attributions = ablator.attribute(input_embeddings, additional_forward_args=(additional_args,), target=0)

More importantly, this doesn’t need the configure_interpretable_embedding_layer/remove_interpretable_embedding_layer functionality and would not even work if you accidentally use it (which was my mistake last time I tried).

I would even recommend using this approach whenever possible – having the possibility of passing the embeddings or the token_ids to a model instead of the configure_interpretable_embedding_layer/remove_interpretable_embedding_layer functionality, as there’s more control on the user side.

1reaction
NarineKcommented, Jan 30, 2021

It would be good to update it. There is a case when I look into token, word and position embeddings separately with the configure layer. We might be able to do that with multi-layer IG right now + layer conductance case would needs a fix too.

Read more comments on GitHub >

github_iconTop Results From Across the Web

LayoutLM — transformers 4.1.1 documentation
In this paper, we propose the textbf{LayoutLM} to jointly model interactions between text and layout information across scanned document images, which is ...
Read more >
Source code for captum.attr._models.base
To do so, we separate embedding layers from the model, ... this will ensure that we can execute the forward pass using interpretable...
Read more >
Bilingual is At Least Monolingual (BALM):
edge about a language by encoding sentences as fixed-length embeddings. ... English transformer models have over 110 million parameters – the size of...
Read more >
Utilizing Transformer Representations Efficiently
Preliminary Setup. Import all required libraries and import modules/utilities. We'll be working with the in this notebook to visualize various embedding layers.
Read more >
Multi-dimensional patient acuity estimation with ...
Transformer model architectures have revolutionized the natural language processing (NLP) domain and continue to produce state-of-the-art results in ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found