question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

TensorFlow "predict" returns empty output with MirroredStrategy

See original GitHub issue

I’m trying to use the predict method of the Keras TensorFlow API but it returns an empty output despite the input is being processed. Calling the model seems to work.

EDIT: the predict method works correctly if the model is loaded with single GPu strategy.

Environment info

  • transformers version: 4.5.1
  • Platform: Linux CentOS 8.1
  • Python version: 3.7.10
  • PyTorch version (GPU?): -
  • Tensorflow version (GPU?): 2.3.2(True)
  • Using GPU in script?: yes
  • Using distributed or parallel set-up in script?: multi-gpu on a single machine

Who can help

Information

Model I am using: Bert

The problem arises when using:

  • the official example scripts: (give details below)
  • my own modified scripts: (give details below)

The tasks I am working on is:

  • an official GLUE/SQUaD task: (give the name)
  • my own task or dataset: (give details below)

To reproduce

Steps to reproduce the behavior:

from transformers import BertTokenizerFast, TFBertForSequenceClassification
import tensorflow as tf

strategy = tf.distribute.MirroredStrategy()
#strategy = tf.distribute.OneDeviceStrategy("/gpu:0")
with strategy.scope():
    tf_model = TFBertForSequenceClassification.from_pretrained('bert-base-cased')

    tokenizer = BertTokenizerFast.from_pretrained('bert-base-cased')
    inputs = tokenizer('This is a test', 'Esto es una prueba',
                   return_tensors='tf', max_length=200,
                   padding='max_length', truncation=True,
                   return_attention_mask=True,
                   return_token_type_ids=False)
    print(tf_model.predict([inputs["input_ids"], inputs["attention_mask"]],
                           verbose=1))
    print(tf_model([inputs["input_ids"], inputs["attention_mask"]]))
All model checkpoint layers were used when initializing TFBertForSequenceClassification.

Some layers of TFBertForSequenceClassification were not initialized from the model checkpoint at bert-base-cased and are newly initialized: ['classifier']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
WARNING:tensorflow:From /venv/lib/python3.7/site-packages/tensorflow/python/data/ops/multi_device_iterator_ops.py:601: get_next_as_optional (from tensorflow.python.data.ops.iterator_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.Iterator.get_next_as_optional()` instead.
WARNING:tensorflow:The parameters `output_attentions`, `output_hidden_states` and `use_cache` cannot be updated when calling a model.They have to be set to True/False in the config object (i.e.: `config=XConfig.from_pretrained('name', output_attentions=True)`).
WARNING:tensorflow:The parameter `return_dict` cannot be set in graph mode and will always be set to `True`.
WARNING:tensorflow:The parameters `output_attentions`, `output_hidden_states` and `use_cache` cannot be updated when calling a model.They have to be set to True/False in the config object (i.e.: `config=XConfig.from_pretrained('name', output_attentions=True)`).
WARNING:tensorflow:The parameter `return_dict` cannot be set in graph mode and will always be set to `True`.
WARNING:tensorflow:The parameters `output_attentions`, `output_hidden_states` and `use_cache` cannot be updated when calling a model.They have to be set to True/False in the config object (i.e.: `config=XConfig.from_pretrained('name', output_attentions=True)`).
WARNING:tensorflow:The parameter `return_dict` cannot be set in graph mode and will always be set to `True`.
WARNING:tensorflow:The parameters `output_attentions`, `output_hidden_states` and `use_cache` cannot be updated when calling a model.They have to be set to True/False in the config object (i.e.: `config=XConfig.from_pretrained('name', output_attentions=True)`).
WARNING:tensorflow:The parameter `return_dict` cannot be set in graph mode and will always be set to `True`.
1/1 [==============================] - 0s 241us/step
TFSequenceClassifierOutput(loss=None, logits=None, hidden_states=None, attentions=None)
TFSequenceClassifierOutput(loss=None, logits=<tf.Tensor: shape=(1, 2), dtype=float32, numpy=array([[-0.47814545,  0.35146457]], dtype=float32)>, hidden_states=None, attentions=None)

Expected behavior

Output should be the same as when model is being called.

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Reactions:2
  • Comments:23 (17 by maintainers)

github_iconTop GitHub Comments

5reactions
ayalaallcommented, Jul 14, 2021

Hi, Any updates about this issue?

3reactions
Rocketknight1commented, May 21, 2021

Putting this here as a writeup of what we know so far:

The issue is not caused by returning an OrderedDict, but instead because we return a TFBaseModelOutput, which is a subclass of OrderedDict decorated with dataclass. Refer to the code here.

If we just return a dict, OrderedDict or ModelOutput (the parent class for TFBaseModelOutput, subclassed from OrderedDict), everything works okay. Therefore the central issue is this data class, which will probably need to be removed. We’re looking at how we can do that now!

Read more comments on GitHub >

github_iconTop Results From Across the Web

tf.distribute.MirroredStrategy | TensorFlow v2.11.0
This API is typically used for aggregating the results returned from different replicas, for reporting etc. For example, loss computed from ...
Read more >
Custom training with tf.distribute.Strategy | TensorFlow Core
MirroredStrategy strategy work? All the variables and the model graph are replicated across the replicas. Input is evenly distributed across the replicas.
Read more >
tf.distribute.MultiWorkerMirroredStrategy | TensorFlow v2.11.0
This API is typically used for aggregating the results returned from different replicas, for reporting etc. For example, loss computed from ...
Read more >
Distributed training with TensorFlow
MirroredStrategy supports synchronous distributed training on multiple GPUs on one machine. It creates one replica per GPU device.
Read more >
tf.compat.v1.distribute.MirroredStrategy | TensorFlow v2.11.0
MirroredStrategy. bookmark_border ... If no GPUs are found, it will use the available CPUs. ... process input and return result
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found