question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[Bert Model] ValueError: not enough values to unpack (expected 3, got 2)

See original GitHub issue

🐛 Bug: ValueError: not enough values to unpack (expected 3, got 2)

Information

I am using Bert initialized with ‘bert-base-uncased’, as per the documentation, the forward step is suppose to yield 4 outputs:

  • last_hidden_state
  • pooler_output
  • hidden_states
  • attentions

But when I try to intialize BERT and call forward method, it yields only 2 results. Based on the shape, I feel they are the hidden_states and pooler_output.

self.bert_model = BertModel.from_pretrained('bert-base-uncased')
_, _, hidden_states = self.bert_model(input_ids, attn_masks, token_type_ids)

Error

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-69-6d2cb1238cab> in <module>
     45 for i, data in enumerate(trainloader):
     46     input_ids, attn_mask, token_type_ids = data['tokens'], data['attention_mask'], data['token_type_ids']
---> 47     start_logits, end_logits = model.forward(input_ids, attn_mask, token_type_ids)
     48     print(start_logits.shape)
     49     print(end_logits.shape)

<ipython-input-69-6d2cb1238cab> in forward(self, input_ids, attn_masks, token_type_ids)
     23 
     24         # Feeding the input to BERT model to obtain hidden_states of all the layers
---> 25         _, _, hidden_states = self.bert_model(input_ids, attn_masks, token_type_ids)
     26 
     27         # Shape of hidden_states is (1, 50, 768)

ValueError: not enough values to unpack (expected 3, got 2)

Model I am using (Bert, XLNet …): Bert Language I am using the model on English

The problem arises when using:

  • the official example scripts: NA
  • my own modified scripts: Below are scripts details.

The tasks I am working on is:

  • an official GLUE/SQUaD task: NA
  • my own task or dataset: Fine tuning for my own task.

To reproduce

Steps to reproduce the behavior:

  1. Copy paste the full code below in a notebook.
  2. Run as is.

Complete code:

tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')

# Dataset definition
class TweetDataset(Dataset):
    def __init__(self, data, maxlen, tokenizer):
        self.df = data
        self.tokenizer = tokenizer
        self.maxlen = maxlen

    def __len__(self):
        return len(self.df)

    def __getitem__(self, index):
        """
        Returns the token_ids_tensors, attn_mask for the item and text denoting the sentiment.

        :param index:
        :return:
        """

        # Selecting the sentence and label at the specified index in the data frame
        orig_sentence = self.df.iloc[index]['text']
        sentiment = self.df.iloc[index]['sentiment']
        selected_text = self.df.iloc[index]['selected_text']

        # Preprocessing the text to be suitable for BERT

        # Encode the sentence. Does the following:
        # 1. Inserting the CLS and SEP token in the beginning and end of the sentence
        # 2. Generates attention mask
        # 3. Generate token_type_ids used to differentiate first part of the sentence from the second
        encoded_dict = self.tokenizer.encode_plus(
            sentiment,
            orig_sentence,
            max_length=self.maxlen,
            truncation_strategy='only_second',
            add_special_tokens=True,
            pad_to_max_length=True,
            return_tensors='pt',
            return_token_type_ids=True,
            return_attention_mask=True
        )
        tokens = encoded_dict['input_ids'][0]
        token_type_ids = encoded_dict['token_type_ids'][0]
        attn_mask = encoded_dict['attention_mask'][0]

        # Determine the beginning and end of the sentence
        def phrase_start_finder(sentence, phrase):
            if phrase not in sentence:
                raise ValueError('s2 not substring of s1')
            start = sentence.find(phrase)
            return len(sentence[:start].strip().split(' '))

        def phrase_end_finder(sentence, phrase):
            if phrase not in sentence:
                raise ValueError('s2 not substring of s1')
            return phrase_start_finder(sentence, phrase) + len(phrase.strip().split(' ')) - 1

        start = phrase_start_finder(orig_sentence, selected_text)
        end = phrase_end_finder(orig_sentence, selected_text)

        return {
            'tokens': tokens,
            'attention_mask': attn_mask,
            'token_type_ids': token_type_ids,
            'start': float(start),
            'end': float(end),
            'sentence': orig_sentence,
            'selected_text': selected_text,
            'sentiment': sentiment
        }

# Defining the loader
dataset = TweetDataset(train_data, 50, tokenizer)

trainloader = DataLoader(
    dataset,
    batch_size=BATCH_SIZE,
    shuffle=True,
    num_workers=4
)

# Defining the model
class TweetModel(nn.Module):
    def __init__(self, freeze_bert=True):
        super(TweetModel, self).__init__()
        # Instantiating BERT model object
        self.bert_model = BertModel.from_pretrained('bert-base-uncased')

        # TODO(Viman): Before training on GPUs and finalization, remove this
        # Freeze bert layers
        # In first experiment, not training the previous layers
        if freeze_bert:
            for p in self.bert_model.parameters():
                p.requires_grad = False

        # Final layer. Needs two outputs which are supposed to be logits: startIndex and endIndex
        self.dropout = nn.Dropout(0.2)
        # 768 because output is a vector of size 768 (Dimensionality of the encoder layer)
        self.fc = nn.Linear(768, 2)
        # Intialize the fc layer
        nn.init.normal_(self.fc.weight, std=0.02)
        nn.init.normal_(self.fc.bias, 0)

    def forward(self, input_ids, attn_masks, token_type_ids):

        # Feeding the input to BERT model to obtain hidden_states of all the layers
        _, _, hidden_states = self.bert_model(input_ids, attn_masks, token_type_ids)

        # Shape of hidden_states is (1, 50, 768)
        # TODO(Viman): Try mean as opposed to max
        # hidden_states, _ = torch.max(hidden_states, dim=1)

        # last_hidden_state = hidden_states[-1]
        print(hidden_states.shape)

        X = self.dropout(hidden_states)
        logits = self.fc(X)

        start_logits, end_logits = logits.split(1, dim=-1)
        start_logits = start_logits.squeeze(-1)
        end_logits = end_logits.squeeze(-1)

        return start_logits, end_logits
    
model = TweetModel()

# Testing the model forward implementation
for i, data in enumerate(trainloader):    
    input_ids, attn_mask, token_type_ids = data['tokens'], data['attention_mask'], data['token_type_ids']
    start_logits, end_logits = model.forward(input_ids, attn_mask, token_type_ids)
    print(start_logits.shape)
    print(end_logits.shape)
    if i == 1:
        break

Expected behavior

The self.bert_model(input_ids, attn_masks, token_type_ids) line should return the a tuple containing 4 elements however it seems to return 2 only.

Environment info

  • transformers version: 2.9.0

  • Platform: Linux-4.19.112±x86_64-with-debian-buster-sid

  • Python version: 3.7.6

  • PyTorch version (GPU?): 1.5.0 (False)

  • Tensorflow version (GPU?): 2.1.0 (False)

  • Using GPU in script?: Not yet

  • Using distributed or parallel set-up in script?: No

  • transformers version: 2.11.0

  • Platform: Mac/Kaggle notebook (Tried in both)

  • Python version: 3.7

  • PyTorch version (GPU?): No

  • Tensorflow version (GPU?): NA

  • Using GPU in script?: No

  • Using distributed or parallel set-up in script?: No

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:14 (5 by maintainers)

github_iconTop GitHub Comments

11reactions
lowspacecommented, May 9, 2021

I have solved the same issue with u but in a different situation. You should double-check the batch size of your input data.

tokens_ids_tensor and attn_mask should be a 2d tensor but not 1d. While batch size is 1, they should look like:

tensor([[  101,  1030,  1054,  2595,  2015, 21486,  2620,  1030,  3841,  7377,
          8197,  3217,  1030,  1054,  2595,  2015, 21486,  2620,  1024,  1030,
          3841,  7377,  8197,  3217,  1001, 15333,  6342,  2483,  9103,   102]],
       device='cuda:0') 
 tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
         1, 1, 1, 1, 1, 1]], device='cuda:0')

but not

tensor([  101,  1030,  1054,  2595,  2015, 21486,  2620,  1030,  3841,  7377,
          8197,  3217,  1030,  1054,  2595,  2015, 21486,  2620,  1024,  1030,
          3841,  7377,  8197,  3217,  1001, 15333,  6342,  2483,  9103,   102],
       device='cuda:0') 
 tensor([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
         1, 1, 1, 1, 1, 1], device='cuda:0')

Further, for n batch size, they should look like:

seq is tensor([[  101,  4911,  1024,  ...,     0,     0,     0],
        [  101,  2054,  2057,  ...,  2860, 28400,   102],
        [  101,  7409,  2000,  ...,  1037, 19062,   102],
        ...,
        [  101,  1001,  2446,  ...,  1024,  1013,   102],
        [  101,  1001,  1037,  ...,  2522,  1013,   102],
        [  101,  1001,  4918,  ...,  1013,  1017,   102]], device='cuda:0') 
 attn_masks is tensor([[1, 1, 1,  ..., 0, 0, 0],
        [1, 1, 1,  ..., 1, 1, 1],
        [1, 1, 1,  ..., 1, 1, 1],
        ...,
        [1, 1, 1,  ..., 1, 1, 1],
        [1, 1, 1,  ..., 1, 1, 1],
        [1, 1, 1,  ..., 1, 1, 1]], device='cuda:0') 
1reaction
ENGSamShamsancommented, Apr 18, 2021

I set output_hidden_states=True in the forward method, however, the same error keep showing. I restarted the kernel and doubled checked the rest of the code. not sure if it’s related to some other parameter in the training. here is my forward pass:

def forward(self,
            input_ids: torch.tensor, # Indices of input sequence tokens in the vocabulary.
            attention_mask: torch.tensor, # Mask to avoid performing attention on padding token indices. 
                                          # Mask values selected in [0, 1]: 1 for tokens 0 for non-tokens [PAD]
            token_type_ids: torch.tensor,# Indices to indicate first and second portions of the inputs.
                                          # 0 sentence A token and 1 sentence B token
                                          # [CLS] SEQUENCE_A [SEP] SEQUENCE_B [SEP]
            intent_labels: torch.tensor = None,# The labels of the Intent classifier 

            slot_labels: torch.tensor = None # The labels for the slot tagging [NER]
        
            ):

    # Feeding the input to BERT model to obtain hidden_states of all the layers
    last_hidden_states, pooler_output = self.bert_model(input_ids=input_ids,
                              attention_mask=attention_mask,
                              token_type_ids=token_type_ids,
                              output_hidden_states=True,
                              return_dict=False)

7. Define huggingface model

dropout = 0.2 num_intent_labels = len(intent_vocab) num_slot_labels = len(slot_vocab)

model = ParserModel(model_name_or_path=‘bert-base-uncased’, dropout=dropout, num_intent_labels=num_intent_labels, num_slot_labels=num_slot_labels,

               )

And here is is the training code where the issue occurs:

outputs = model(input_ids=input_ids, attention_mask=attention_mask, token_type_ids=token_type_ids, slot_labels=slot_labels, intent_labels=intent_labels)


ValueError Traceback (most recent call last) <ipython-input-54-3d510ec5d296> in <module>() 31 token_type_ids=token_type_ids, 32 slot_labels=slot_labels, —> 33 intent_labels=intent_labels) 34 slot_loss, intent_loss = outputs[2],outputs[3] 35 slot_loss.backward(retain_graph=True) #need to retain_graph when working with multiple losses

3 frames /usr/local/lib/python3.7/dist-packages/transformers/models/bert/modeling_bert.py in forward(self, input_ids, attention_mask, token_type_ids, position_ids, head_mask, inputs_embeds, encoder_hidden_states, encoder_attention_mask, past_key_values, use_cache, output_attentions, output_hidden_states, return_dict) 923 elif input_ids is not None: 924 input_shape = input_ids.size() –> 925 batch_size, seq_length = input_shape 926 elif inputs_embeds is not None: 927 input_shape = inputs_embeds.size()[:-1]

ValueError: not enough values to unpack (expected 2, got 1)

Are you suspecting other places of the code to be the issue?

Read more comments on GitHub >

github_iconTop Results From Across the Web

python - How to fix “ValueError: not enough values to unpack ...
My problem was that 'input_ids' and 'attention_mask' have to be 2D ... BertModel expects a batch of training instances (e.g. input_id [[.
Read more >
ValueError: not enough values to unpack (expected 2, got 1)
The error message is fairly self-explanatory. Your program expects python split() to yield 2 elements, but in your case, it is only yielding...
Read more >
Related error not enough values to unpack (expected 2, got 1)
During a multiple value assignment, the ValueError: not enough values to unpack occurs when either you have fewer objects to assign than ...
Read more >
Python ValueError: not enough values to unpack Solution
The ValueError: not enough values to unpack error is raised when you try to unpack too many values from an iterable.
Read more >
too many values to unpack (expected 2)', upon training a Bert ...
This error message is telling us that the data we passed into the BERT model has too many dimensions. Usually the dimension that...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found