question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Why choose model.embeddings over model.embeddings.word_embeddings when calculating LayerIntegratedGradients for Bert like models.

See original GitHub issue

Hi,

I’ve been playing with the idea of only using a models’ embeddings.word_embeddings when calculating LIG for Bert like models rather than embeddings. However I have noticed that with some models the attribution scores can change quite a bit for certain words.

I know that for a model like Bert the embeddings are often broken down into word_embeddings, token_type_embeddings, and position_embeddings. However, the only embedding layer that returns meaningful results attribution results is word_embeddings, if I use the token or position embeddings I get nan for all word attributions.

What I was wondering is if there is any reason for models like bert, distilbert, etc to calculate LIG attributions along all the embedding rather than just word ?

Let me know if you need any clarification on what I’m trying to ask, I hope it’s not too vague. Thanks !

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:15 (15 by maintainers)

github_iconTop GitHub Comments

2reactions
cdpiersecommented, Mar 6, 2021

@NarineK Ah, this worked perfectly for me, I should have seen that inputs can take multiple values. Thanks for helping me with this, it’s a huge help. Much appreciated ! Do you think the original SQUAD example or notebook would benefit from this addition ? I could put together a PR if you think so.

2reactions
cdpiersecommented, Mar 2, 2021

@NarineK thanks for the response. I had see that example before but I think I’ve put together why I was getting such weird responses, I wasn’t testing very thoroughly apparently. I’ve been using the distilbert model as an example model for lot of these tasks, primarily because it’s smaller faster to test on etc. The problem is that distilbert doesn’t accept position_ids or token_ids as inputs for its forward pass unlike most other model configs in the HF library. Thanks again for the response!

Read more comments on GitHub >

github_iconTop Results From Across the Web

BERT Word Embeddings Tutorial - Chris McCormick
In this post, I take an in-depth look at word embeddings produced by ... BERT offers an advantage over models like Word2Vec, because...
Read more >
Interpreting BERT Models (Part 1) - Captum
We show how to use interpretation hooks to examine and better understand embeddings, sub-embeddings, bert, and attention layers.
Read more >
BERT Model Embeddings aren't as good as you think
What's more, the performance drops even further when you mix different input languages to calculate similarity. Models like mBERT predict vector ...
Read more >
Difference between Word2Vec and BERT | The Startup
BERT model explicitly takes as input the position (index) of each word in the sentence before calculating its embedding.
Read more >
Information Leakage in Embedding Models - arXiv
Attributes such as authorship of text can be easily extracted by training an inference model on just a handful of labeled embedding vectors....
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found