Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

assertion failed: [predictions must be >= 0]

See original GitHub issue

Trying to train a binary classifier over sentence pairs with custom dataset throws a Tensroflow error.

Environment info

transformers version: 4.2.2
Platform: Ubuntu 18.04
Python version: 3.7.5
PyTorch version (GPU?):
Tensorflow version (GPU): 2.3.1
Using GPU in script?: Yes
Using distributed or parallel set-up in script?: Nope

Who can help

Information

Model I am using (TFRoberta, TFXLMRoberta…):

The problem arises when using:

the official example scripts: (give details below)
my own modified scripts: (give details below)

The tasks I am working on is:

an official GLUE/SQUaD task: https://huggingface.co/transformers/training.html#fine-tuning-in-native-tensorflow-2
my own task or dataset:

To reproduce

Steps to reproduce the behavior:

from transformers import TFAutoModelForSequenceClassification, AutoTokenizer
from keras.metrics import Precision, Recall
import tensorflow as tf

def build_dataset(tokenizer, filename):
    data = [[], [], []]
    with open(filename, 'r') as file_:
        for line in file_:
            fields = line.split('\t')
            data[0].append(fields[0].strip())
            data[1].append(fields[1].strip())
            data[2].append(int(fields[2].strip()))
    sentences = tokenizer(data[0], data[1],
                          padding=True,
                          truncation=True)

    return tf.data.Dataset.from_tensor_slices((dict(sentences),
                                             data[2]))
settings = {
    "model": 'roberta-base',
    "batch_size": 8,
    "n_classes": 1,
    "epochs": 10,
    "steps_per_epoch": 128,
    "patience": 5,
    "loss": "binary_crossentropy",
    "lr": 5e7,
    "clipnorm": 1.0,
}
tokenizer = AutoTokenizer.from_pretrained(settings["model"])

train_dataset = build_dataset(tokenizer, 'train.head')
train_dataset = train_dataset.shuffle(
        len(train_dataset)).batch(settings["batch_size"])

dev_dataset = build_dataset(tokenizer, 'dev.head').batch(
        settings["batch_size"])

model = TFAutoModelForSequenceClassification.from_pretrained(
            settings['model'],
            num_labels=1)
model.compile(optimizer='adam',
              #loss='binary_crossentropy',
              loss=model.compute_loss,
              metrics=[Precision(name='p'), Recall(name='r')])
model.summary()
model.fit(train_dataset,
          epochs=settings["epochs"],
          #steps_per_epoch=steps_per_epoch,
          validation_data=dev_dataset,
          batch_size=settings["batch_size"],
          verbose=1)

Gives the following output

Some layers of TFRobertaForSequenceClassification were not initialized from the model checkpoint at roberta-base and are newly initialized: ['classifier']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Model: "tf_roberta_for_sequence_classification"
_________________________________________________________________
Layer (type)                 Output Shape              Param #
=================================================================
roberta (TFRobertaMainLayer) multiple                  124055040
_________________________________________________________________
classifier (TFRobertaClassif multiple                  591361
=================================================================
Total params: 124,646,401
Trainable params: 124,646,401
Non-trainable params: 0
_________________________________________________________________
Epoch 1/10
The parameters `output_attentions`, `output_hidden_states` and `use_cache` cannot be updated when calling a model.They have to be set to True/False in the config object (i.e.: `config=XConfig.from_pretrained('name', output_attentions=True)`).
The parameter `return_dict` cannot be set in graph mode and will always be set to `True`.
The parameters `output_attentions`, `output_hidden_states` and `use_cache` cannot be updated when calling a model.They have to be set to True/False in the config object (i.e.: `config=XConfig.from_pretrained('name', output_attentions=True)`).
The parameter `return_dict` cannot be set in graph mode and will always be set to `True`.
Traceback (most recent call last):
  File "finetune.py", line 52, in <module>
    verbose=1)
  File "/work/user/bicleaner-neural/venv/lib/python3.7/site-packages/tensorflow/python/keras/engine/training.py", line 108, in _method_wrapper
    return method(self, *args, **kwargs)
  File "/work/user/bicleaner-neural/venv/lib/python3.7/site-packages/tensorflow/python/keras/engine/training.py", line 1098, in fit
    tmp_logs = train_function(iterator)
  File "/work/user/bicleaner-neural/venv/lib/python3.7/site-packages/tensorflow/python/eager/def_function.py", line 780, in __call__
    result = self._call(*args, **kwds)
  File "/work/user/bicleaner-neural/venv/lib/python3.7/site-packages/tensorflow/python/eager/def_function.py", line 840, in _call
    return self._stateless_fn(*args, **kwds)
  File "/work/user/bicleaner-neural/venv/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 2829, in __call__
    return graph_function._filtered_call(args, kwargs)  # pylint: disable=protected-access
  File "/work/user/bicleaner-neural/venv/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 1848, in _filtered_call
    cancellation_manager=cancellation_manager)
  File "/work/user/bicleaner-neural/venv/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 1924, in _call_flat
    ctx, args, cancellation_manager=cancellation_manager))
  File "/work/user/bicleaner-neural/venv/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 550, in call
    ctx=ctx)
  File "/work/user/bicleaner-neural/venv/lib/python3.7/site-packages/tensorflow/python/eager/execute.py", line 60, in quick_execute
    inputs, attrs, num_outputs)
tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found.
  (0) Invalid argument:  assertion failed: [predictions must be >= 0] [Condition x >= y did not hold element-wise:] [x (tf_roberta_for_sequence_classification/classifier/out_proj/BiasAdd:0) = ] [[0.153356239][0.171548933][0.121127911]...] [y (Cast_3/x:0) = ] [0]
         [[{{node assert_greater_equal/Assert/AssertGuard/else/_1/assert_greater_equal/Assert/AssertGuard/Assert}}]]
         [[assert_greater_equal_1/Assert/AssertGuard/pivot_f/_31/_205]]
  (1) Invalid argument:  assertion failed: [predictions must be >= 0] [Condition x >= y did not hold element-wise:] [x (tf_roberta_for_sequence_classification/classifier/out_proj/BiasAdd:0) = ] [[0.153356239][0.171548933][0.121127911]...] [y (Cast_3/x:0) = ] [0]
         [[{{node assert_greater_equal/Assert/AssertGuard/else/_1/assert_greater_equal/Assert/AssertGuard/Assert}}]]
0 successful operations.
0 derived errors ignored. [Op:__inference_train_function_20780]

Function call stack:
train_function -> train_function

The dataset examples look like this:

print(list(train_dataset.take(1).as_numpy_iterator()))

[({'input_ids': array([[    0,   133,   864, ...,     1,     1,     1],
       [    0,   133,   382, ...,     1,     1,     1],
       [    0,  1121,   645, ...,     1,     1,     1],
       ...,
       [    0,   133,   864, ...,     1,     1,     1],
       [    0,  1121,   144, ...,     1,     1,     1],
       [    0,   495, 21046, ...,     1,     1,     1]], dtype=int32), 'attention_mask': array([[1, 1, 1, ..., 0, 0, 0],
       [1, 1, 1, ..., 0, 0, 0],
       [1, 1, 1, ..., 0, 0, 0],
       ...,
       [1, 1, 1, ..., 0, 0, 0],
       [1, 1, 1, ..., 0, 0, 0],
       [1, 1, 1, ..., 0, 0, 0]], dtype=int32)}, array([0, 0, 0, 0, 1, 0, 0, 0], dtype=int32))]

Expected behavior

Issue Analytics

State:
Created 3 years ago
Comments:6 (3 by maintainers)

Top GitHub Comments

2reactions

ZJaumecommented, Feb 10, 2021

Figured out that the error was thrown by the Precision and Recall classes because they require values between 0 and 1. In case someone wants to use them when training with native TensowFlow I managed to add the argmax to the the classes with this:

from tensorflow.python.keras.utils import metrics_utils

class PrecisionArgmax(Precision):
    def update_state(self, y_true, y_pred, sample_weight=None):
        y_pred = tf.math.argmax(y_pred, -1)
        return metrics_utils.update_confusion_matrix_variables(
            {
                metrics_utils.ConfusionMatrix.TRUE_POSITIVES: self.true_positives,
                metrics_utils.ConfusionMatrix.FALSE_POSITIVES: self.false_positives
            },
            y_true,
            y_pred,
            thresholds=self.thresholds,
            top_k=self.top_k,
            class_id=self.class_id,
            sample_weight=sample_weight)

So the code that I posted works with num_classes=2 and using the the overridden classes as metrics.

0reactions

jplucommented, Feb 9, 2021

For binary classification, you have two labels and then neurons, it is more intuitive to proceed that way 😃 but yes you can also do what you propose and round the float value to 0 or 1 depending of the output of your sigmoid activation. Nevertheless, our models don’t propose such approach.