Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Issue in layer-drop implementation in TensorFlow models in graph mode

See original GitHub issue

Environment info

transformers version: 4.8.1
Platform: Linux-5.4.104±x86_64-with-Ubuntu-18.04-bionic
Python version: 3.7.10
PyTorch version (GPU?): 1.9.0+cu102 (False)
Tensorflow version (GPU?): 2.5.0 (False)
Flax version (CPU?/GPU?/TPU?): not installed (NA)
Jax version: not installed
JaxLib version: not installed
Using GPU in script?: Yes
Using distributed or parallel set-up in script?: No

Who can help

@Rocketknight1

Information

Model I am using: TFBartForConditionalGeneration

The problem arises when using:

the official example scripts: (give details below)
my own modified scripts: (give details below)

The tasks I am working on is:

an official GLUE/SQUaD task: (give the name)
my own task or dataset: (give details below)

To reproduce

from transformers import TFBartForConditionalGeneration, BartConfig

# keeping layerdrop to be very high value for demonstration error
model = TFBartForConditionalGeneration(BartConfig(encoder_layerdrop=0.5))

import tensorflow as tf
import numpy as np

array = np.random.randint(1, 300, size=(4, 256))
dataset = tf.constant(array, dtype=tf.int32)

# following cell works perfectly when `tf.function(...)` is removed
@tf.function
def train_step(tensor):
  return model(tensor, training=True)

from tqdm.auto import tqdm

for tensor in tqdm(dataset, total=len(dataset)):
  tensor = tf.expand_dims(tensor, 0)
  output = train_step(tensor)

You can checkout this small Colab notebook also for reproducing the error.

ValueError: in user code:

    <ipython-input-5-ca2e97b30313>:4 train_step  *
        return model(tensor, training=True)
    /usr/local/lib/python3.7/dist-packages/transformers/models/bart/modeling_tf_bart.py:1393 call  *
        outputs = self.model(
    /usr/local/lib/python3.7/dist-packages/transformers/models/bart/modeling_tf_bart.py:1125 call  *
        inputs["encoder_outputs"] = self.encoder(
    /usr/local/lib/python3.7/dist-packages/transformers/models/bart/modeling_tf_bart.py:764 call  *
        hidden_states, attn = encoder_layer(
    /usr/local/lib/python3.7/dist-packages/transformers/models/bart/modeling_tf_bart.py:305 call  *
        hidden_states, self_attn_weights, _ = self.self_attn(
    /usr/local/lib/python3.7/dist-packages/transformers/models/bart/modeling_tf_bart.py:178 call  *
        query_states = self.q_proj(hidden_states) * self.scaling
    /usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/engine/base_layer.py:1023 __call__  **
        self._maybe_build(inputs)
    /usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/engine/base_layer.py:2625 _maybe_build
        self.build(input_shapes)  # pylint:disable=not-callable
    /usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/layers/core.py:1198 build
        trainable=True)
    /usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/engine/base_layer.py:655 add_weight
        caching_device=caching_device)
    /usr/local/lib/python3.7/dist-packages/tensorflow/python/training/tracking/base.py:815 _add_variable_with_custom_getter
        **kwargs_for_getter)
    /usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/engine/base_layer_utils.py:139 make_variable
        shape=variable_shape if variable_shape else None)
    /usr/local/lib/python3.7/dist-packages/tensorflow/python/ops/variables.py:260 __call__
        return cls._variable_v1_call(*args, **kwargs)
    /usr/local/lib/python3.7/dist-packages/tensorflow/python/ops/variables.py:221 _variable_v1_call
        shape=shape)
    /usr/local/lib/python3.7/dist-packages/tensorflow/python/ops/variables.py:67 getter
        return captured_getter(captured_previous, **kwargs)
    /usr/local/lib/python3.7/dist-packages/tensorflow/python/eager/def_function.py:769 invalid_creator_scope
        "tf.function-decorated function tried to create "

    ValueError: tf.function-decorated function tried to create variables on non-first call.

Side note: I have checked this same thing for TFWav2Vec2 also, but same issue is happening. So, possibly all TF model using layer-drop needs to be fixed.

Expected behavior

layer drop should work perfectly in graph mode.

Issue Analytics

State:
Created 2 years ago
Comments:8 (7 by maintainers)

Top GitHub Comments

1reaction

Rocketknight1commented, Jun 28, 2021

Cool! We’ll hold off on disabling it for now - if you find a solution, let us know, and don’t panic if it turns out to be impossible - just say so and we’ll close this issue and disable layerdrop in graph mode instead. Thanks for your help!

1reaction

Rocketknight1commented, Jun 28, 2021

On investigation, I’m pretty sure the issue is caused by the way we’re doing layerdrop: https://github.com/huggingface/transformers/blob/master/src/transformers/models/bart/modeling_tf_bart.py#L755-L772

This code is correct for eager execution, but I suspect in graph mode that this leads to the creation of new variables and graph edges whenever a layer is skipped for the first time. I can see some workarounds, but unfortunately no perfect ones - this seems like a fundamental limitation of the way graph mode works in TF.

You’re welcome to investigate and try to find a solution if you like, but we’re probably just going to explicitly disable layer drop in graph mode for now.

Top Results From Across the Web

How to implement LayerDrop in TensorFlow Transformers

I need to implement layer drop in TensorFlow Transformer. ... Do you get an explicit error in graph mode (tf.function)?.

[TFBART] LayerDrop not working on TPU · Issue #9048 - GitHub

The parameter return_dict cannot be set in graph mode and will always be set to True . The parameters output_attentions , output_hidden_states ...

Wav2Vec2 - Hugging Face

The Wav2Vec2 model was proposed in wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations by Alexei Baevski, Henry Zhou, Abdelrahman ...

Dropout Regularization in Deep Learning Models with Keras

Dropout is a regularization technique for neural network models proposed by Srivastava et ... This is how Dropout is implemented in Keras.

PyTorch vs TensorFlow — spotting the difference

As you can see, implementation in TensorFlow works too (surprisingly ). ... In TensorFlow you define graph statically before a model can run....