Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Why choose model.embeddings over model.embeddings.word_embeddings when calculating LayerIntegratedGradients for Bert like models.

See original GitHub issue

Hi,

I’ve been playing with the idea of only using a models’ embeddings.word_embeddings when calculating LIG for Bert like models rather than embeddings. However I have noticed that with some models the attribution scores can change quite a bit for certain words.

I know that for a model like Bert the embeddings are often broken down into word_embeddings, token_type_embeddings, and position_embeddings. However, the only embedding layer that returns meaningful results attribution results is word_embeddings, if I use the token or position embeddings I get nan for all word attributions.

What I was wondering is if there is any reason for models like bert, distilbert, etc to calculate LIG attributions along all the embedding rather than just word ?

Let me know if you need any clarification on what I’m trying to ask, I hope it’s not too vague. Thanks !

Issue Analytics

State:
Created 3 years ago
Comments:15 (15 by maintainers)

Top GitHub Comments

2reactions

cdpiersecommented, Mar 6, 2021

@NarineK Ah, this worked perfectly for me, I should have seen that inputs can take multiple values. Thanks for helping me with this, it’s a huge help. Much appreciated ! Do you think the original SQUAD example or notebook would benefit from this addition ? I could put together a PR if you think so.

2reactions

cdpiersecommented, Mar 2, 2021

@NarineK thanks for the response. I had see that example before but I think I’ve put together why I was getting such weird responses, I wasn’t testing very thoroughly apparently. I’ve been using the distilbert model as an example model for lot of these tasks, primarily because it’s smaller faster to test on etc. The problem is that distilbert doesn’t accept position_ids or token_ids as inputs for its forward pass unlike most other model configs in the HF library. Thanks again for the response!

Top Results From Across the Web

BERT Word Embeddings Tutorial - Chris McCormick

In this post, I take an in-depth look at word embeddings produced by ... BERT offers an advantage over models like Word2Vec, because...

Interpreting BERT Models (Part 1) - Captum

We show how to use interpretation hooks to examine and better understand embeddings, sub-embeddings, bert, and attention layers.

BERT Model Embeddings aren't as good as you think

What's more, the performance drops even further when you mix different input languages to calculate similarity. Models like mBERT predict vector ...

Difference between Word2Vec and BERT | The Startup

BERT model explicitly takes as input the position (index) of each word in the sentence before calculating its embedding.

Information Leakage in Embedding Models - arXiv

Attributes such as authorship of text can be easily extracted by training an inference model on just a handful of labeled embedding vectors....

Troubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.

Start Free

Top Related Reddit Thread

No results found

Top Related Tweet

No results found

Top Related Dev.to Post

No results found

Why choose model.embeddings over model.embeddings.word_embeddings when calculating LayerIntegratedGradients for Bert like models.

Issue Analytics

Top GitHub Comments

Top Results From Across the Web

Top Related Medium Post

Top Related StackOverflow Question

Troubleshoot Live Code

Top Related Reddit Thread

Top Related Hackernoon Post

Top Related Tweet

Top Related Dev.to Post

Top Related Hashnode Post

BUG: `tests` module is included in distribution

Transformer models above version 4.1.1 can't be wrapped with the configure_interpretable_embedding_layer