Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

TensorFlow "predict" returns empty output with MirroredStrategy

See original GitHub issue

I’m trying to use the predict method of the Keras TensorFlow API but it returns an empty output despite the input is being processed. Calling the model seems to work.

EDIT: the predict method works correctly if the model is loaded with single GPu strategy.

Environment info

transformers version: 4.5.1
Platform: Linux CentOS 8.1
Python version: 3.7.10
PyTorch version (GPU?): -
Tensorflow version (GPU?): 2.3.2(True)
Using GPU in script?: yes
Using distributed or parallel set-up in script?: multi-gpu on a single machine

Who can help

Information

Model I am using: Bert

The problem arises when using:

the official example scripts: (give details below)
my own modified scripts: (give details below)

The tasks I am working on is:

an official GLUE/SQUaD task: (give the name)
my own task or dataset: (give details below)

To reproduce

Steps to reproduce the behavior:

from transformers import BertTokenizerFast, TFBertForSequenceClassification
import tensorflow as tf

strategy = tf.distribute.MirroredStrategy()
#strategy = tf.distribute.OneDeviceStrategy("/gpu:0")
with strategy.scope():
    tf_model = TFBertForSequenceClassification.from_pretrained('bert-base-cased')

    tokenizer = BertTokenizerFast.from_pretrained('bert-base-cased')
    inputs = tokenizer('This is a test', 'Esto es una prueba',
                   return_tensors='tf', max_length=200,
                   padding='max_length', truncation=True,
                   return_attention_mask=True,
                   return_token_type_ids=False)
    print(tf_model.predict([inputs["input_ids"], inputs["attention_mask"]],
                           verbose=1))
    print(tf_model([inputs["input_ids"], inputs["attention_mask"]]))

All model checkpoint layers were used when initializing TFBertForSequenceClassification.

Some layers of TFBertForSequenceClassification were not initialized from the model checkpoint at bert-base-cased and are newly initialized: ['classifier']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
WARNING:tensorflow:From /venv/lib/python3.7/site-packages/tensorflow/python/data/ops/multi_device_iterator_ops.py:601: get_next_as_optional (from tensorflow.python.data.ops.iterator_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.Iterator.get_next_as_optional()` instead.
WARNING:tensorflow:The parameters `output_attentions`, `output_hidden_states` and `use_cache` cannot be updated when calling a model.They have to be set to True/False in the config object (i.e.: `config=XConfig.from_pretrained('name', output_attentions=True)`).
WARNING:tensorflow:The parameter `return_dict` cannot be set in graph mode and will always be set to `True`.
WARNING:tensorflow:The parameters `output_attentions`, `output_hidden_states` and `use_cache` cannot be updated when calling a model.They have to be set to True/False in the config object (i.e.: `config=XConfig.from_pretrained('name', output_attentions=True)`).
WARNING:tensorflow:The parameter `return_dict` cannot be set in graph mode and will always be set to `True`.
WARNING:tensorflow:The parameters `output_attentions`, `output_hidden_states` and `use_cache` cannot be updated when calling a model.They have to be set to True/False in the config object (i.e.: `config=XConfig.from_pretrained('name', output_attentions=True)`).
WARNING:tensorflow:The parameter `return_dict` cannot be set in graph mode and will always be set to `True`.
WARNING:tensorflow:The parameters `output_attentions`, `output_hidden_states` and `use_cache` cannot be updated when calling a model.They have to be set to True/False in the config object (i.e.: `config=XConfig.from_pretrained('name', output_attentions=True)`).
WARNING:tensorflow:The parameter `return_dict` cannot be set in graph mode and will always be set to `True`.
1/1 [==============================] - 0s 241us/step
TFSequenceClassifierOutput(loss=None, logits=None, hidden_states=None, attentions=None)
TFSequenceClassifierOutput(loss=None, logits=<tf.Tensor: shape=(1, 2), dtype=float32, numpy=array([[-0.47814545,  0.35146457]], dtype=float32)>, hidden_states=None, attentions=None)

Expected behavior

Output should be the same as when model is being called.

Issue Analytics

State:
Created 2 years ago
Reactions:2
Comments:23 (17 by maintainers)

Top GitHub Comments

5reactions

ayalaallcommented, Jul 14, 2021

Hi, Any updates about this issue?

3reactions

Rocketknight1commented, May 21, 2021

Putting this here as a writeup of what we know so far:

The issue is not caused by returning an OrderedDict, but instead because we return a TFBaseModelOutput, which is a subclass of OrderedDict decorated with dataclass. Refer to the code here.

If we just return a dict, OrderedDict or ModelOutput (the parent class for TFBaseModelOutput, subclassed from OrderedDict), everything works okay. Therefore the central issue is this data class, which will probably need to be removed. We’re looking at how we can do that now!

Top Results From Across the Web

tf.distribute.MirroredStrategy | TensorFlow v2.11.0

This API is typically used for aggregating the results returned from different replicas, for reporting etc. For example, loss computed from ...

Custom training with tf.distribute.Strategy | TensorFlow Core

MirroredStrategy strategy work? All the variables and the model graph are replicated across the replicas. Input is evenly distributed across the replicas.

tf.distribute.MultiWorkerMirroredStrategy | TensorFlow v2.11.0

This API is typically used for aggregating the results returned from different replicas, for reporting etc. For example, loss computed from ...

Distributed training with TensorFlow

MirroredStrategy supports synchronous distributed training on multiple GPUs on one machine. It creates one replica per GPU device.

tf.compat.v1.distribute.MirroredStrategy | TensorFlow v2.11.0

MirroredStrategy. bookmark_border ... If no GPUs are found, it will use the available CPUs. ... process input and return result