Interpreting Fine-tuned Bert model using LIME
See original GitHub issueThanks for this amazing work. I am trying to interpret Fine-tuned BERT model using Transformer framework. It seems there is tokenization issue, when I try to use LIME with BERT. Here is the error that i am getting:
Traceback (most recent call last):
File "src/predict.py", line 351, in <module>
exp = explainer.explain_instance(s, prediction.predictor, num_features=6)
File "/home/ramesh/.virtualenvs/transformer-env/lib/python3.6/site-packages/lime/lime_text.py", line 417, in explain_instance
distance_metric=distance_metric)
File "/home/ramesh/.virtualenvs/transformer-env/lib/python3.6/site-packages/lime/lime_text.py", line 484, in __data_labels_distances
labels = classifier_fn(inverse_data)
File "src/predict.py", line 297, in predictor
input_ids, input_mask, segment_ids = self.convert_text_to_features(text)
File "src/predict.py", line 135, in convert_text_to_features
tokens_a = self.tokenizer.tokenize(text_a)
File "/home/ramesh/.virtualenvs/transformer-env/lib/python3.6/site-packages/transformers/tokenization_utils.py", line 649, in tokenize
tokenized_text = split_on_tokens(added_tokens, text)
File "/home/ramesh/.virtualenvs/transformer-env/lib/python3.6/site-packages/transformers/tokenization_utils.py", line 637, in split_on_tokens
if sub_text not in self.added_tokens_encoder \
TypeError: unhashable type: 'list'
Here is my code:
def predictor(self, text):
max_seq_length=128
input_ids, input_mask, segment_ids = self.convert_text_to_features(text)
self.model.to(self.device)
with torch.no_grad():
outputs = self.model(input_ids, input_mask, segment_ids)
logits = outputs[0]
logits = F.softmax(logits, dim=1)
return logits.numpy()
def convert_text_to_features(self, text_a, text_b=None):
features = []
cls_token = self.tokenizer.cls_token
sep_token = self.tokenizer.sep_token
cls_token_at_end = False
sequence_a_segment_id = 0
sequence_b_segment_id = 1
cls_token_segment_id = 1
pad_token_segment_id = 0
mask_padding_with_zero = True
pad_token = 0
tokens_a = self.tokenizer.tokenize(text_a)
tokens_b = None
self._truncate_seq_pair(tokens_a, self.max_seq_length - 2)
tokens = tokens_a + [sep_token]
segment_ids = [sequence_a_segment_id] * len(tokens)
if tokens_b:
tokens += tokens_b + [sep_token]
segment_ids += [sequence_b_segment_id] * (len(tokens_b) + 1)
tokens = [cls_token] + tokens
segment_ids = [cls_token_segment_id] + segment_ids
input_ids = self.tokenizer.convert_tokens_to_ids(tokens)
input_mask = [1 if mask_padding_with_zero else 0] * len(input_ids)
padding_length = self.max_seq_length - len(input_ids)
input_ids = input_ids + ([pad_token] * padding_length)
input_mask = input_mask + ([0 if mask_padding_with_zero else 1] * padding_length)
segment_ids = segment_ids + ([pad_token_segment_id] * padding_length)
assert len(input_ids) == self.max_seq_length
assert len(input_mask) == self.max_seq_length
assert len(segment_ids) == self.max_seq_length
input_ids = torch.tensor([input_ids], dtype=torch.long).to(self.device)
input_mask = torch.tensor([input_mask], dtype=torch.long).to(self.device)
segment_ids = torch.tensor([segment_ids], dtype=torch.long).to(self.device)
return input_ids, input_mask, segment_ids
if __name__ == '__main__':
model_path = "models/mrpc"
bert_model_class = "bert"
prediction = Prediction(bert_model_class, model_path, lower_case=True, seq_length=128)
label_names = [0, 1]
explainer = LimeTextExplainer(class_names=label_names)
train_df = pd.read_csv("data/train.tsv", sep = '\t')
for example in train_df["string"]:
exp = explainer.explain_instance(example, prediction.predictor, num_features=6)
print(exp.as_list())
I have checked this issue356, but still i cannot figure out my problem.
Any leads will be appreciated.
Thank you 😃
Issue Analytics
- State:
- Created 4 years ago
- Comments:12 (1 by maintainers)
Top Results From Across the Web
Applying LIME interpretation on my fine-tuned BERT for ...
Applying LIME interpretation on my fine-tuned BERT for sequence classification model? ; in <module> exp = explainer.explain_instance(example, ...
Read more >Using LIME to explain the predictions from a BERT model, it ...
To be clear, I have a BERT model that I'm fine-tuning for a downstream binary classification task. I've frozen the BERT model itself...
Read more >Interpreting BERT Models (Part 1) - Captum
In this notebook we demonstrate how to interpret Bert models using Captum library. In this particular case study we focus on a fine-tuned...
Read more >BERT regression & LIME explainer - Hugging Face Forums
Greetings, I am looking to apply a LIME explainer to a fine-tuned BERT-model with a linear output layer. My training pipeline is vanilla, ......
Read more >Interpreting Fine-tuned Bert model using LIME - Bountysource
I am trying to interpret Fine-tuned BERT model using Transformer framework. It seems there is tokenization issue, when I try to use LIME...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Thanks for your reply. I figured it out, if anyone is interested in interpreting BERT using LIME. Here is the correct example 😃
what is “MODEL_CLASSES” in your code?