Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

TFBartForConditionalGeneration with labels padded with -100 gives Nan loss.

See original GitHub issue

I am pretraining T5 and Bart. I noticed that the padding token for labels of these models should be -100 for decoder_input_ids.

I change the padding token for labels for T5(pytorch, tensorflow) and Bart(pytorch), and it works well. But, For Bart(tensorflow) gives Nan loss.

Because of this, I also get a error message for pretraining: tensorflow.python.framework.errors_impl.InvalidArgumentError: Received a label value of -100 which is outside the valid range of [0, 50265). Label values: 0 2387 2335 16 11962 2 -100 -100 -100 -100 -100 ...........

Environment info

transformers version: 4.2.2
Platform: ubuntu 18.04
Python version: 3.6
PyTorch version (GPU?):
Tensorflow version (GPU?): 2.4.0
Using GPU in script?: yes (colab)
Using distributed or parallel set-up in script?: no

Bart: @patrickvonplaten

Information

Model I am using (Bert, XLNet …): TFBartForConditionalGeneration

The problem arises when using:

the official example scripts: (give details below)
my own modified scripts: (give details below)

The tasks I am working on is:

an official GLUE/SQUaD task: (give the name)
my own task or dataset: (give details below)

To reproduce

import tensorflow as tf
from transformers import BartTokenizer, TFBartForConditionalGeneration

tokenizer = BartTokenizer.from_pretrained("facebook/bart-base")
model = TFBartForConditionalGeneration.from_pretrained("facebook/bart-base")

inputs = tokenizer("My dog is <mask>", return_tensors='tf', truncation=True, max_length=16, padding="max_length")
labels_ids = tokenizer("My dog is cute", return_tensors='tf', truncation=True, max_length=16, padding="max_length").input_ids

## labels padding_token = 1
loss = model(inputs, labels=labels_ids)[0]
print(labels_ids)
print(loss)

## labels padding_token = -100
labels_ids = tf.where(
    labels_ids == 1, tf.fill(tf.shape(labels_ids), tf.constant(-100, dtype='int32')), labels_ids
)

loss = model(inputs, labels=labels_ids)[0]
print(labels_ids)
print(loss)

Resurts:

tf.Tensor(
[[    0  2387  2335    16 11962     2     1     1     1     1     1     1
      1     1     1     1]], shape=(1, 16), dtype=int32)
tf.Tensor(
[2.2291888e-05 4.8874615e-05 3.7073401e-05 7.9230859e-04 6.1941872e+00
 1.1058841e+00], shape=(6,), dtype=float32)
tf.Tensor(
[[    0  2387  2335    16 11962     2  -100  -100  -100  -100  -100  -100
   -100  -100  -100  -100]], shape=(1, 16), dtype=int32)
tf.Tensor(
[2.2291888e-05 4.8755410e-05 3.7073401e-05 7.9242775e-04 6.1941872e+00
 1.1058841e+00           nan           nan           nan           nan
           nan           nan           nan           nan           nan
           nan], shape=(16,), dtype=float32)

Issue Analytics

State:
Created 3 years ago
Comments:5 (5 by maintainers)

Top GitHub Comments

2reactions

patrickvonplatencommented, Jan 25, 2021

Great catch @kiyoungkim1!

It’s not very consistent what we are doing here…TFBart should have never ignored the pad_token_id as a default setting, but -100 as all other models do.

To fix the problem, I think we should add a couple of lines that check if -100 are in the labels and if yes replaces them with the pad_token_id to have consistency with PyTorch’s Bart. It would be a pretty big breaking change to just replace pad_token_id with -100 so I think the first option is the better one. @kiyoungkim1 if you feel like opening a PR to correct this behavior we would be more than happy 😃

1reaction

jplucommented, Jan 25, 2021

We also plan to turn all the loss computation, not anymore as a method but as a layer, so it will be much easier to use, to configure and TensorFlow workflow compliant.