question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

On masked-lm labels and computing the loss

See original GitHub issue

Recently I was using bert for my own project, and going through the function mask_tokens I found this line of code labels[~masked_indices] = -100 # We only compute loss on masked tokens I wonder why we do this? like i get the part where we do

    indices_replaced = torch.bernoulli(torch.full(labels.shape, 0.8)).bool() & masked_indices
    inputs[indices_replaced] = tokenizer.convert_tokens_to_ids(tokenizer.mask_token)

To mask the input tokens but is it necessary for labels? Like if I had a constant -100 as ground truth and the actual id maybe say 1000 the loss may never converge

And I’ve found two contradictory comments ie labels[~masked_indices] = -100 # We only compute loss on masked tokens and

```(run_language_modeling)
masked_lm_labels (:obj:`torch.LongTensor` of shape :obj:`(batch_size, sequence_length)`, `optional`, defaults to :obj:`None`):
            Labels for computing the masked language modeling loss.
            Indices should be in ``[-100, 0, ..., config.vocab_size]`` (see ``input_ids`` docstring)
            Tokens with indices set to ``-100`` are ignored (masked), the loss is only computed for the tokens with labels
            in ``[0, ..., config.vocab_size]
```(modeling_bert)

One says loss will be computed on masked and another says will be ignored… Could anyone please let me know about it… Thanks.

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:6 (2 by maintainers)

github_iconTop GitHub Comments

1reaction
gmaratoscommented, Nov 17, 2021

@LysandreJik Isn’t the example mentioned in the official documentation missing the following line of code before feeding labels into model?

labels[inputs.input_ids != tokenizer.mask_token_id] = -100

I believe, with this we calculate the negative log likelihood, just for the masked token which is `Paris’ in the given example.

Yes, I was wondering why this is missing as well. There doesn’t seem to be any documentation indicating that this is happening automatically before the loss is computed. And, based on some limited testing on my end I get different values for the loss when I do this.

0reactions
achingachamcommented, Oct 21, 2020

@LysandreJik Isn’t the example mentioned in the official documentation missing the following line of code before feeding labels into model?

labels[inputs.input_ids != tokenizer.mask_token_id] = -100

I believe, with this we calculate the negative log likelihood, just for the masked token which is `Paris’ in the given example.

Read more comments on GitHub >

github_iconTop Results From Across the Web

BertForMaskedLM's loss and scores, how the loss is computed?
I have a simple MaskedLM model with one masked token at position 7. The model returns 20.2516 and 18.0698 as loss and score...
Read more >
How does masked_lm_labels argument work in ...
The first argument is the masked input, the masked_lm_labels argument is the desired output. The input_ids should be masked.
Read more >
Understanding Masked Language Models (MLM) and Causal ...
And based on the prediction made by the model against the actual label, we calculate cross-entropy loss and backpropagate it to train the ......
Read more >
How does masked_lm_labels work - PyTorch Forums
LongTensor of shape (batch_size, sequence_length), optional, defaults to None) – Labels for computing the masked language modeling loss.
Read more >
End-to-end Masked Language Modeling with BERT - Keras
Mean(name="loss") class MaskedLanguageModel(tf.keras. ... predictions, sample_weight=sample_weight) # Compute gradients trainable_vars ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found