RuntimeError: Expected tensor for argument #1 'indices' to have scalar type Long
See original GitHub issueHello,
I am pretty new to captum. I am trying to run the IntegratedGradient method for transformer based sequence classification. However I get the following error:
RuntimeError: Expected tensor for argument #1 ‘indices’ to have scalar type Long; but got torch.FloatTensor instead (while checking arguments for embedding)
Here is my code:
import os
import sys
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import torch
import torch.nn as nn
from transformers import BertTokenizer, BertForQuestionAnswering, BertConfig
from transformers import RobertaTokenizer, RobertaForSequenceClassification, RobertaConfig
from sklearn.preprocessing import LabelEncoder
from captum.attr import visualization as viz
from captum.attr import IntegratedGradients, LayerConductance, LayerIntegratedGradients
from captum.attr import configure_interpretable_embedding_layer, remove_interpretable_embedding_layer
import re
device = torch.device("cpu")
# load model
model = RobertaForSequenceClassification.from_pretrained('roberta-base', num_labels=4,
output_attentions=False,
output_hidden_states=False,)
model.to(device)
model.eval()
model.zero_grad()
# load tokenizer
tokenizer = RobertaTokenizer.from_pretrained('roberta-base')
def encode_sentences(tokenizer, article_list, evidence_list, max_len=512):
"""
Encodes article/evidence pair and returns tensor
:param tokenizer: tokenizer to use to encode
:param article_list: list of str
:param evidence_list: list of str
:param max_len: int; max length of encoding
:return: encoded tensors
"""
input_ids = []
attention_masks = []
# ensure length of article_list and evidence_list are equal
zip_list = zip(cycle(article_list), evidence_list) if len(article_list) < len(evidence_list) else zip(article_list, evidence_list)
for article, evidence in zip_list:
encoded_dict = tokenizer.encode_plus(
text=evidence, # Sentence to encode.
text_pair=article,
add_special_tokens=True, # Add '[CLS]' and '[SEP]'
max_length=max_len, # Pad & truncate all sentences.
pad_to_max_length=True,
return_attention_mask=True, # Construct attn. masks.
return_tensors='pt', # Return pytorch tensors.
)
input_ids.append(encoded_dict['input_ids'])
attention_masks.append(encoded_dict['attention_mask'])
# Convert the lists into tensors.
input_ids = torch.cat(input_ids, dim=0)
attention_masks = torch.cat(attention_masks, dim=0)
return input_ids, attention_masks
def predict_instance(input_ids, attention_masks):
with torch.no_grad():
outputs = model(input_ids, token_type_ids=None, attention_mask=attention_masks)
logits = outputs[0]
logits = logits.detach().cpu().numpy()
softmax_values = softmax([logits[0]]).flatten()
type(softmax_values)
return softmax_values
article = "Manchester United is one of the most successful English football club in the history. The club has been performing poorely since their stark manager Alex Ferguson retired. However recently appointed manager, Ole is showing signs of promise in the rebuilding process."
evidence = "Ole Gunner Solskjaer is doing an excellent job with Manchester United"
input_ids, masks = encode_sentences(tokenizer, [article], [evidence])
# applying integrated gradients on the SoftmaxModel and input data point
ig = IntegratedGradients(predict_instance)
attributions, approximation_error = ig.attribute((input_ids, masks), target=target_class_index, return_convergence_delta=True)
# The input and returned corresponding attribution have the
# same shape and dimensionality.
assert attributions.shape == input.shape
This issue seems very similar to this one: https://github.com/huggingface/transformers/issues/2952
I have tried casting the input_ids and masks to long tensor according the the suggestion. But it did not help.
Issue Analytics
- State:
- Created 3 years ago
- Comments:5 (3 by maintainers)
Top Results From Across the Web
Expected tensor for argument #1 'indices' to have scalar type ...
RuntimeError : Expected tensor for argument #1 'indices' to have scalar type Long; but got torch.cuda.IntTensor instead (while checking arguments for embedding) ...
Read more >Expected tensor for argument #1 'indices' to have scalar type ...
python 3. x - Pytorch RuntimeError: Expected tensor for argument #1 'indices' to have scalar type Long; but got CUDAType instead - Stack ......
Read more >Expected tensor for argument #1 'indices' to ... - PyTorch Forums
RuntimeError : Expected tensor for argument #1 'indices' to have scalar type Long; but got CPUFloatTensor instead (while checking arguments ...
Read more >"Expected tensor for argument #1 'indices' to have scalar type ...
Error message : RuntimeError: Expected tensor for argument #1 'indices' to have scalar type Long; but got torch.
Read more >Expected tensor for argument #1 'indices' to have scalar type ...
I am trying to re-execute a GitHub project on my computer for recommendation using embedding, the goal is to first embed the user...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Hello @vivekmig Thank you so much for pointing me towards my mistake. I have followed the IMDB example, However I am getting the following error now:
Here is my updated code:
Hi @sajjadriaj , it seems that the issue here is that you are trying to compute attributions with respect to token indices, but we cannot actually compute gradients with respect to these indices, only to the corresponding embeddings. More information regarding this can be found in this FAQ answer.
To compute Integrated Gradients for tokens, you would need to attribute with respect to word embeddings rather than input indices, which can be done either by overriding the embedding layer or using a LayerAttribution method. Examples of these can be found in our BERT or IMDB tutorials here.