[Bert Model] ValueError: not enough values to unpack (expected 3, got 2)
See original GitHub issue🐛 Bug: ValueError: not enough values to unpack (expected 3, got 2)
Information
I am using Bert initialized with ‘bert-base-uncased’, as per the documentation, the forward step is suppose to yield 4 outputs:
- last_hidden_state
- pooler_output
- hidden_states
- attentions
But when I try to intialize BERT and call forward method, it yields only 2 results. Based on the shape, I feel they are the hidden_states and pooler_output.
self.bert_model = BertModel.from_pretrained('bert-base-uncased')
_, _, hidden_states = self.bert_model(input_ids, attn_masks, token_type_ids)
Error
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-69-6d2cb1238cab> in <module>
45 for i, data in enumerate(trainloader):
46 input_ids, attn_mask, token_type_ids = data['tokens'], data['attention_mask'], data['token_type_ids']
---> 47 start_logits, end_logits = model.forward(input_ids, attn_mask, token_type_ids)
48 print(start_logits.shape)
49 print(end_logits.shape)
<ipython-input-69-6d2cb1238cab> in forward(self, input_ids, attn_masks, token_type_ids)
23
24 # Feeding the input to BERT model to obtain hidden_states of all the layers
---> 25 _, _, hidden_states = self.bert_model(input_ids, attn_masks, token_type_ids)
26
27 # Shape of hidden_states is (1, 50, 768)
ValueError: not enough values to unpack (expected 3, got 2)
Model I am using (Bert, XLNet …): Bert Language I am using the model on English
The problem arises when using:
- the official example scripts: NA
- my own modified scripts: Below are scripts details.
The tasks I am working on is:
- an official GLUE/SQUaD task: NA
- my own task or dataset: Fine tuning for my own task.
To reproduce
Steps to reproduce the behavior:
- Copy paste the full code below in a notebook.
- Run as is.
Complete code:
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
# Dataset definition
class TweetDataset(Dataset):
def __init__(self, data, maxlen, tokenizer):
self.df = data
self.tokenizer = tokenizer
self.maxlen = maxlen
def __len__(self):
return len(self.df)
def __getitem__(self, index):
"""
Returns the token_ids_tensors, attn_mask for the item and text denoting the sentiment.
:param index:
:return:
"""
# Selecting the sentence and label at the specified index in the data frame
orig_sentence = self.df.iloc[index]['text']
sentiment = self.df.iloc[index]['sentiment']
selected_text = self.df.iloc[index]['selected_text']
# Preprocessing the text to be suitable for BERT
# Encode the sentence. Does the following:
# 1. Inserting the CLS and SEP token in the beginning and end of the sentence
# 2. Generates attention mask
# 3. Generate token_type_ids used to differentiate first part of the sentence from the second
encoded_dict = self.tokenizer.encode_plus(
sentiment,
orig_sentence,
max_length=self.maxlen,
truncation_strategy='only_second',
add_special_tokens=True,
pad_to_max_length=True,
return_tensors='pt',
return_token_type_ids=True,
return_attention_mask=True
)
tokens = encoded_dict['input_ids'][0]
token_type_ids = encoded_dict['token_type_ids'][0]
attn_mask = encoded_dict['attention_mask'][0]
# Determine the beginning and end of the sentence
def phrase_start_finder(sentence, phrase):
if phrase not in sentence:
raise ValueError('s2 not substring of s1')
start = sentence.find(phrase)
return len(sentence[:start].strip().split(' '))
def phrase_end_finder(sentence, phrase):
if phrase not in sentence:
raise ValueError('s2 not substring of s1')
return phrase_start_finder(sentence, phrase) + len(phrase.strip().split(' ')) - 1
start = phrase_start_finder(orig_sentence, selected_text)
end = phrase_end_finder(orig_sentence, selected_text)
return {
'tokens': tokens,
'attention_mask': attn_mask,
'token_type_ids': token_type_ids,
'start': float(start),
'end': float(end),
'sentence': orig_sentence,
'selected_text': selected_text,
'sentiment': sentiment
}
# Defining the loader
dataset = TweetDataset(train_data, 50, tokenizer)
trainloader = DataLoader(
dataset,
batch_size=BATCH_SIZE,
shuffle=True,
num_workers=4
)
# Defining the model
class TweetModel(nn.Module):
def __init__(self, freeze_bert=True):
super(TweetModel, self).__init__()
# Instantiating BERT model object
self.bert_model = BertModel.from_pretrained('bert-base-uncased')
# TODO(Viman): Before training on GPUs and finalization, remove this
# Freeze bert layers
# In first experiment, not training the previous layers
if freeze_bert:
for p in self.bert_model.parameters():
p.requires_grad = False
# Final layer. Needs two outputs which are supposed to be logits: startIndex and endIndex
self.dropout = nn.Dropout(0.2)
# 768 because output is a vector of size 768 (Dimensionality of the encoder layer)
self.fc = nn.Linear(768, 2)
# Intialize the fc layer
nn.init.normal_(self.fc.weight, std=0.02)
nn.init.normal_(self.fc.bias, 0)
def forward(self, input_ids, attn_masks, token_type_ids):
# Feeding the input to BERT model to obtain hidden_states of all the layers
_, _, hidden_states = self.bert_model(input_ids, attn_masks, token_type_ids)
# Shape of hidden_states is (1, 50, 768)
# TODO(Viman): Try mean as opposed to max
# hidden_states, _ = torch.max(hidden_states, dim=1)
# last_hidden_state = hidden_states[-1]
print(hidden_states.shape)
X = self.dropout(hidden_states)
logits = self.fc(X)
start_logits, end_logits = logits.split(1, dim=-1)
start_logits = start_logits.squeeze(-1)
end_logits = end_logits.squeeze(-1)
return start_logits, end_logits
model = TweetModel()
# Testing the model forward implementation
for i, data in enumerate(trainloader):
input_ids, attn_mask, token_type_ids = data['tokens'], data['attention_mask'], data['token_type_ids']
start_logits, end_logits = model.forward(input_ids, attn_mask, token_type_ids)
print(start_logits.shape)
print(end_logits.shape)
if i == 1:
break
Expected behavior
The self.bert_model(input_ids, attn_masks, token_type_ids) line should return the a tuple containing 4 elements however it seems to return 2 only.
Environment info
-
transformers
version: 2.9.0 -
Platform: Linux-4.19.112±x86_64-with-debian-buster-sid
-
Python version: 3.7.6
-
PyTorch version (GPU?): 1.5.0 (False)
-
Tensorflow version (GPU?): 2.1.0 (False)
-
Using GPU in script?: Not yet
-
Using distributed or parallel set-up in script?: No
-
transformers
version: 2.11.0 -
Platform: Mac/Kaggle notebook (Tried in both)
-
Python version: 3.7
-
PyTorch version (GPU?): No
-
Tensorflow version (GPU?): NA
-
Using GPU in script?: No
-
Using distributed or parallel set-up in script?: No
Issue Analytics
- State:
- Created 3 years ago
- Comments:14 (5 by maintainers)
Top GitHub Comments
I have solved the same issue with u but in a different situation. You should double-check the batch size of your input data.
tokens_ids_tensor
andattn_mask
should be a 2d tensor but not 1d. While batch size is 1, they should look like:but not
Further, for n batch size, they should look like:
I set output_hidden_states=True in the forward method, however, the same error keep showing. I restarted the kernel and doubled checked the rest of the code. not sure if it’s related to some other parameter in the training. here is my forward pass:
And here is is the training code where the issue occurs:
Are you suspecting other places of the code to be the issue?