Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

On masked-lm labels and computing the loss

See original GitHub issue

Recently I was using bert for my own project, and going through the function mask_tokens I found this line of code labels[~masked_indices] = -100 # We only compute loss on masked tokens I wonder why we do this? like i get the part where we do

    indices_replaced = torch.bernoulli(torch.full(labels.shape, 0.8)).bool() & masked_indices
    inputs[indices_replaced] = tokenizer.convert_tokens_to_ids(tokenizer.mask_token)

To mask the input tokens but is it necessary for labels? Like if I had a constant -100 as ground truth and the actual id maybe say 1000 the loss may never converge

And I’ve found two contradictory comments ie labels[~masked_indices] = -100 # We only compute loss on masked tokens and

```(run_language_modeling)
masked_lm_labels (:obj:`torch.LongTensor` of shape :obj:`(batch_size, sequence_length)`, `optional`, defaults to :obj:`None`):
            Labels for computing the masked language modeling loss.
            Indices should be in ``[-100, 0, ..., config.vocab_size]`` (see ``input_ids`` docstring)
            Tokens with indices set to ``-100`` are ignored (masked), the loss is only computed for the tokens with labels
            in ``[0, ..., config.vocab_size]
```(modeling_bert)

One says loss will be computed on masked and another says will be ignored… Could anyone please let me know about it… Thanks.

Issue Analytics

State:
Created 4 years ago
Comments:6 (2 by maintainers)

Top GitHub Comments

1reaction

gmaratoscommented, Nov 17, 2021

@LysandreJik Isn’t the example mentioned in the official documentation missing the following line of code before feeding labels into model?

labels[inputs.input_ids != tokenizer.mask_token_id] = -100

I believe, with this we calculate the negative log likelihood, just for the masked token which is `Paris’ in the given example.

Yes, I was wondering why this is missing as well. There doesn’t seem to be any documentation indicating that this is happening automatically before the loss is computed. And, based on some limited testing on my end I get different values for the loss when I do this.

0reactions

achingachamcommented, Oct 21, 2020

@LysandreJik Isn’t the example mentioned in the official documentation missing the following line of code before feeding labels into model?

labels[inputs.input_ids != tokenizer.mask_token_id] = -100

I believe, with this we calculate the negative log likelihood, just for the masked token which is `Paris’ in the given example.

Top Results From Across the Web

BertForMaskedLM's loss and scores, how the loss is computed?

I have a simple MaskedLM model with one masked token at position 7. The model returns 20.2516 and 18.0698 as loss and score...

How does masked_lm_labels argument work in ...

The first argument is the masked input, the masked_lm_labels argument is the desired output. The input_ids should be masked.

Understanding Masked Language Models (MLM) and Causal ...

And based on the prediction made by the model against the actual label, we calculate cross-entropy loss and backpropagate it to train the ......

How does masked_lm_labels work - PyTorch Forums

LongTensor of shape (batch_size, sequence_length), optional, defaults to None) – Labels for computing the masked language modeling loss.

End-to-end Masked Language Modeling with BERT - Keras

Mean(name="loss") class MaskedLanguageModel(tf.keras. ... predictions, sample_weight=sample_weight) # Compute gradients trainable_vars ...

Troubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.

Start Free

Top Related Reddit Thread

No results found

Top Related Tweet

No results found

Top Related Dev.to Post

No results found

RuntimeError: Expected tensor for argument #1 'indices' to have scalar type Long; but got torch.cuda.IntTensor instead (while checking arguments for embedding)

On masked-lm labels and computing the loss

Issue Analytics

Top GitHub Comments

Top Results From Across the Web

Top Related Medium Post

Top Related StackOverflow Question

Troubleshoot Live Code

Top Related Reddit Thread

Top Related Hackernoon Post

Top Related Tweet

Top Related Dev.to Post

Top Related Hashnode Post

RuntimeError: Expected tensor for argument #1 'indices' to have scalar type Long; but got torch.cuda.IntTensor instead (while checking arguments for embedding)

Loading tensorflow first and then loading transformers errors