Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[Bert Model] ValueError: not enough values to unpack (expected 3, got 2)

See original GitHub issue

🐛 Bug: ValueError: not enough values to unpack (expected 3, got 2)

Information

I am using Bert initialized with ‘bert-base-uncased’, as per the documentation, the forward step is suppose to yield 4 outputs:

last_hidden_state
pooler_output
hidden_states
attentions

But when I try to intialize BERT and call forward method, it yields only 2 results. Based on the shape, I feel they are the hidden_states and pooler_output.

self.bert_model = BertModel.from_pretrained('bert-base-uncased')
_, _, hidden_states = self.bert_model(input_ids, attn_masks, token_type_ids)

Error

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-69-6d2cb1238cab> in <module>
     45 for i, data in enumerate(trainloader):
     46     input_ids, attn_mask, token_type_ids = data['tokens'], data['attention_mask'], data['token_type_ids']
---> 47     start_logits, end_logits = model.forward(input_ids, attn_mask, token_type_ids)
     48     print(start_logits.shape)
     49     print(end_logits.shape)

<ipython-input-69-6d2cb1238cab> in forward(self, input_ids, attn_masks, token_type_ids)
     23 
     24         # Feeding the input to BERT model to obtain hidden_states of all the layers
---> 25         _, _, hidden_states = self.bert_model(input_ids, attn_masks, token_type_ids)
     26 
     27         # Shape of hidden_states is (1, 50, 768)

ValueError: not enough values to unpack (expected 3, got 2)

Model I am using (Bert, XLNet …): Bert Language I am using the model on English

The problem arises when using:

the official example scripts: NA
my own modified scripts: Below are scripts details.

The tasks I am working on is:

an official GLUE/SQUaD task: NA
my own task or dataset: Fine tuning for my own task.

To reproduce

Steps to reproduce the behavior:

Copy paste the full code below in a notebook.
Run as is.

Complete code:

tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')

# Dataset definition
class TweetDataset(Dataset):
    def __init__(self, data, maxlen, tokenizer):
        self.df = data
        self.tokenizer = tokenizer
        self.maxlen = maxlen

    def __len__(self):
        return len(self.df)

    def __getitem__(self, index):
        """
        Returns the token_ids_tensors, attn_mask for the item and text denoting the sentiment.

        :param index:
        :return:
        """

        # Selecting the sentence and label at the specified index in the data frame
        orig_sentence = self.df.iloc[index]['text']
        sentiment = self.df.iloc[index]['sentiment']
        selected_text = self.df.iloc[index]['selected_text']

        # Preprocessing the text to be suitable for BERT

        # Encode the sentence. Does the following:
        # 1. Inserting the CLS and SEP token in the beginning and end of the sentence
        # 2. Generates attention mask
        # 3. Generate token_type_ids used to differentiate first part of the sentence from the second
        encoded_dict = self.tokenizer.encode_plus(
            sentiment,
            orig_sentence,
            max_length=self.maxlen,
            truncation_strategy='only_second',
            add_special_tokens=True,
            pad_to_max_length=True,
            return_tensors='pt',
            return_token_type_ids=True,
            return_attention_mask=True
        )
        tokens = encoded_dict['input_ids'][0]
        token_type_ids = encoded_dict['token_type_ids'][0]
        attn_mask = encoded_dict['attention_mask'][0]

        # Determine the beginning and end of the sentence
        def phrase_start_finder(sentence, phrase):
            if phrase not in sentence:
                raise ValueError('s2 not substring of s1')
            start = sentence.find(phrase)
            return len(sentence[:start].strip().split(' '))

        def phrase_end_finder(sentence, phrase):
            if phrase not in sentence:
                raise ValueError('s2 not substring of s1')
            return phrase_start_finder(sentence, phrase) + len(phrase.strip().split(' ')) - 1

        start = phrase_start_finder(orig_sentence, selected_text)
        end = phrase_end_finder(orig_sentence, selected_text)

        return {
            'tokens': tokens,
            'attention_mask': attn_mask,
            'token_type_ids': token_type_ids,
            'start': float(start),
            'end': float(end),
            'sentence': orig_sentence,
            'selected_text': selected_text,
            'sentiment': sentiment
        }

# Defining the loader
dataset = TweetDataset(train_data, 50, tokenizer)

trainloader = DataLoader(
    dataset,
    batch_size=BATCH_SIZE,
    shuffle=True,
    num_workers=4
)

# Defining the model
class TweetModel(nn.Module):
    def __init__(self, freeze_bert=True):
        super(TweetModel, self).__init__()
        # Instantiating BERT model object
        self.bert_model = BertModel.from_pretrained('bert-base-uncased')

        # TODO(Viman): Before training on GPUs and finalization, remove this
        # Freeze bert layers
        # In first experiment, not training the previous layers
        if freeze_bert:
            for p in self.bert_model.parameters():
                p.requires_grad = False

        # Final layer. Needs two outputs which are supposed to be logits: startIndex and endIndex
        self.dropout = nn.Dropout(0.2)
        # 768 because output is a vector of size 768 (Dimensionality of the encoder layer)
        self.fc = nn.Linear(768, 2)
        # Intialize the fc layer
        nn.init.normal_(self.fc.weight, std=0.02)
        nn.init.normal_(self.fc.bias, 0)

    def forward(self, input_ids, attn_masks, token_type_ids):

        # Feeding the input to BERT model to obtain hidden_states of all the layers
        _, _, hidden_states = self.bert_model(input_ids, attn_masks, token_type_ids)

        # Shape of hidden_states is (1, 50, 768)
        # TODO(Viman): Try mean as opposed to max
        # hidden_states, _ = torch.max(hidden_states, dim=1)

        # last_hidden_state = hidden_states[-1]
        print(hidden_states.shape)

        X = self.dropout(hidden_states)
        logits = self.fc(X)

        start_logits, end_logits = logits.split(1, dim=-1)
        start_logits = start_logits.squeeze(-1)
        end_logits = end_logits.squeeze(-1)

        return start_logits, end_logits
    
model = TweetModel()

# Testing the model forward implementation
for i, data in enumerate(trainloader):    
    input_ids, attn_mask, token_type_ids = data['tokens'], data['attention_mask'], data['token_type_ids']
    start_logits, end_logits = model.forward(input_ids, attn_mask, token_type_ids)
    print(start_logits.shape)
    print(end_logits.shape)
    if i == 1:
        break

Expected behavior

The self.bert_model(input_ids, attn_masks, token_type_ids) line should return the a tuple containing 4 elements however it seems to return 2 only.

Environment info

transformers version: 2.9.0
Platform: Linux-4.19.112±x86_64-with-debian-buster-sid
Python version: 3.7.6
PyTorch version (GPU?): 1.5.0 (False)
Tensorflow version (GPU?): 2.1.0 (False)
Using GPU in script?: Not yet
Using distributed or parallel set-up in script?: No
transformers version: 2.11.0
Platform: Mac/Kaggle notebook (Tried in both)
Python version: 3.7
PyTorch version (GPU?): No
Tensorflow version (GPU?): NA
Using GPU in script?: No
Using distributed or parallel set-up in script?: No

Issue Analytics

State:
Created 3 years ago
Comments:14 (5 by maintainers)

Top GitHub Comments

11reactions

lowspacecommented, May 9, 2021

I have solved the same issue with u but in a different situation. You should double-check the batch size of your input data.

tokens_ids_tensor and attn_mask should be a 2d tensor but not 1d. While batch size is 1, they should look like:

tensor([[  101,  1030,  1054,  2595,  2015, 21486,  2620,  1030,  3841,  7377,
          8197,  3217,  1030,  1054,  2595,  2015, 21486,  2620,  1024,  1030,
          3841,  7377,  8197,  3217,  1001, 15333,  6342,  2483,  9103,   102]],
       device='cuda:0') 
 tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
         1, 1, 1, 1, 1, 1]], device='cuda:0')

but not

tensor([  101,  1030,  1054,  2595,  2015, 21486,  2620,  1030,  3841,  7377,
          8197,  3217,  1030,  1054,  2595,  2015, 21486,  2620,  1024,  1030,
          3841,  7377,  8197,  3217,  1001, 15333,  6342,  2483,  9103,   102],
       device='cuda:0') 
 tensor([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
         1, 1, 1, 1, 1, 1], device='cuda:0')

Further, for n batch size, they should look like:

seq is tensor([[  101,  4911,  1024,  ...,     0,     0,     0],
        [  101,  2054,  2057,  ...,  2860, 28400,   102],
        [  101,  7409,  2000,  ...,  1037, 19062,   102],
        ...,
        [  101,  1001,  2446,  ...,  1024,  1013,   102],
        [  101,  1001,  1037,  ...,  2522,  1013,   102],
        [  101,  1001,  4918,  ...,  1013,  1017,   102]], device='cuda:0') 
 attn_masks is tensor([[1, 1, 1,  ..., 0, 0, 0],
        [1, 1, 1,  ..., 1, 1, 1],
        [1, 1, 1,  ..., 1, 1, 1],
        ...,
        [1, 1, 1,  ..., 1, 1, 1],
        [1, 1, 1,  ..., 1, 1, 1],
        [1, 1, 1,  ..., 1, 1, 1]], device='cuda:0')

1reaction

ENGSamShamsancommented, Apr 18, 2021

I set output_hidden_states=True in the forward method, however, the same error keep showing. I restarted the kernel and doubled checked the rest of the code. not sure if it’s related to some other parameter in the training. here is my forward pass:

def forward(self,
            input_ids: torch.tensor, # Indices of input sequence tokens in the vocabulary.
            attention_mask: torch.tensor, # Mask to avoid performing attention on padding token indices. 
                                          # Mask values selected in [0, 1]: 1 for tokens 0 for non-tokens [PAD]
            token_type_ids: torch.tensor,# Indices to indicate first and second portions of the inputs.
                                          # 0 sentence A token and 1 sentence B token
                                          # [CLS] SEQUENCE_A [SEP] SEQUENCE_B [SEP]
            intent_labels: torch.tensor = None,# The labels of the Intent classifier 

            slot_labels: torch.tensor = None # The labels for the slot tagging [NER]
        
            ):

    # Feeding the input to BERT model to obtain hidden_states of all the layers
    last_hidden_states, pooler_output = self.bert_model(input_ids=input_ids,
                              attention_mask=attention_mask,
                              token_type_ids=token_type_ids,
                              output_hidden_states=True,
                              return_dict=False)

7. Define huggingface model

dropout = 0.2 num_intent_labels = len(intent_vocab) num_slot_labels = len(slot_vocab)

model = ParserModel(model_name_or_path=‘bert-base-uncased’, dropout=dropout, num_intent_labels=num_intent_labels, num_slot_labels=num_slot_labels,

And here is is the training code where the issue occurs:

outputs = model(input_ids=input_ids, attention_mask=attention_mask, token_type_ids=token_type_ids, slot_labels=slot_labels, intent_labels=intent_labels)

ValueError Traceback (most recent call last) <ipython-input-54-3d510ec5d296> in <module>() 31 token_type_ids=token_type_ids, 32 slot_labels=slot_labels, —> 33 intent_labels=intent_labels) 34 slot_loss, intent_loss = outputs[2],outputs[3] 35 slot_loss.backward(retain_graph=True) #need to retain_graph when working with multiple losses

3 frames /usr/local/lib/python3.7/dist-packages/transformers/models/bert/modeling_bert.py in forward(self, input_ids, attention_mask, token_type_ids, position_ids, head_mask, inputs_embeds, encoder_hidden_states, encoder_attention_mask, past_key_values, use_cache, output_attentions, output_hidden_states, return_dict) 923 elif input_ids is not None: 924 input_shape = input_ids.size() –> 925 batch_size, seq_length = input_shape 926 elif inputs_embeds is not None: 927 input_shape = inputs_embeds.size()[:-1]

ValueError: not enough values to unpack (expected 2, got 1)

Are you suspecting other places of the code to be the issue?