question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Issue in layer-drop implementation in TensorFlow models in graph mode

See original GitHub issue

Environment info

  • transformers version: 4.8.1
  • Platform: Linux-5.4.104±x86_64-with-Ubuntu-18.04-bionic
  • Python version: 3.7.10
  • PyTorch version (GPU?): 1.9.0+cu102 (False)
  • Tensorflow version (GPU?): 2.5.0 (False)
  • Flax version (CPU?/GPU?/TPU?): not installed (NA)
  • Jax version: not installed
  • JaxLib version: not installed
  • Using GPU in script?: Yes
  • Using distributed or parallel set-up in script?: No

Who can help

@Rocketknight1

Information

Model I am using: TFBartForConditionalGeneration

The problem arises when using:

  • the official example scripts: (give details below)
  • my own modified scripts: (give details below)

The tasks I am working on is:

  • an official GLUE/SQUaD task: (give the name)
  • my own task or dataset: (give details below)

To reproduce

from transformers import TFBartForConditionalGeneration, BartConfig

# keeping layerdrop to be very high value for demonstration error
model = TFBartForConditionalGeneration(BartConfig(encoder_layerdrop=0.5))

import tensorflow as tf
import numpy as np

array = np.random.randint(1, 300, size=(4, 256))
dataset = tf.constant(array, dtype=tf.int32)

# following cell works perfectly when `tf.function(...)` is removed
@tf.function
def train_step(tensor):
  return model(tensor, training=True)

from tqdm.auto import tqdm

for tensor in tqdm(dataset, total=len(dataset)):
  tensor = tf.expand_dims(tensor, 0)
  output = train_step(tensor)

You can checkout this small Colab notebook also for reproducing the error.

ValueError: in user code:

    <ipython-input-5-ca2e97b30313>:4 train_step  *
        return model(tensor, training=True)
    /usr/local/lib/python3.7/dist-packages/transformers/models/bart/modeling_tf_bart.py:1393 call  *
        outputs = self.model(
    /usr/local/lib/python3.7/dist-packages/transformers/models/bart/modeling_tf_bart.py:1125 call  *
        inputs["encoder_outputs"] = self.encoder(
    /usr/local/lib/python3.7/dist-packages/transformers/models/bart/modeling_tf_bart.py:764 call  *
        hidden_states, attn = encoder_layer(
    /usr/local/lib/python3.7/dist-packages/transformers/models/bart/modeling_tf_bart.py:305 call  *
        hidden_states, self_attn_weights, _ = self.self_attn(
    /usr/local/lib/python3.7/dist-packages/transformers/models/bart/modeling_tf_bart.py:178 call  *
        query_states = self.q_proj(hidden_states) * self.scaling
    /usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/engine/base_layer.py:1023 __call__  **
        self._maybe_build(inputs)
    /usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/engine/base_layer.py:2625 _maybe_build
        self.build(input_shapes)  # pylint:disable=not-callable
    /usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/layers/core.py:1198 build
        trainable=True)
    /usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/engine/base_layer.py:655 add_weight
        caching_device=caching_device)
    /usr/local/lib/python3.7/dist-packages/tensorflow/python/training/tracking/base.py:815 _add_variable_with_custom_getter
        **kwargs_for_getter)
    /usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/engine/base_layer_utils.py:139 make_variable
        shape=variable_shape if variable_shape else None)
    /usr/local/lib/python3.7/dist-packages/tensorflow/python/ops/variables.py:260 __call__
        return cls._variable_v1_call(*args, **kwargs)
    /usr/local/lib/python3.7/dist-packages/tensorflow/python/ops/variables.py:221 _variable_v1_call
        shape=shape)
    /usr/local/lib/python3.7/dist-packages/tensorflow/python/ops/variables.py:67 getter
        return captured_getter(captured_previous, **kwargs)
    /usr/local/lib/python3.7/dist-packages/tensorflow/python/eager/def_function.py:769 invalid_creator_scope
        "tf.function-decorated function tried to create "

    ValueError: tf.function-decorated function tried to create variables on non-first call.

Side note: I have checked this same thing for TFWav2Vec2 also, but same issue is happening. So, possibly all TF model using layer-drop needs to be fixed.

Expected behavior

layer drop should work perfectly in graph mode.

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:8 (7 by maintainers)

github_iconTop GitHub Comments

1reaction
Rocketknight1commented, Jun 28, 2021

Cool! We’ll hold off on disabling it for now - if you find a solution, let us know, and don’t panic if it turns out to be impossible - just say so and we’ll close this issue and disable layerdrop in graph mode instead. Thanks for your help!

1reaction
Rocketknight1commented, Jun 28, 2021

On investigation, I’m pretty sure the issue is caused by the way we’re doing layerdrop: https://github.com/huggingface/transformers/blob/master/src/transformers/models/bart/modeling_tf_bart.py#L755-L772

This code is correct for eager execution, but I suspect in graph mode that this leads to the creation of new variables and graph edges whenever a layer is skipped for the first time. I can see some workarounds, but unfortunately no perfect ones - this seems like a fundamental limitation of the way graph mode works in TF.

You’re welcome to investigate and try to find a solution if you like, but we’re probably just going to explicitly disable layer drop in graph mode for now.

Read more comments on GitHub >

github_iconTop Results From Across the Web

How to implement LayerDrop in TensorFlow Transformers
I need to implement layer drop in TensorFlow Transformer. ... Do you get an explicit error in graph mode (tf.function)?.
Read more >
[TFBART] LayerDrop not working on TPU · Issue #9048 - GitHub
The parameter return_dict cannot be set in graph mode and will always be set to True . The parameters output_attentions , output_hidden_states ...
Read more >
Wav2Vec2 - Hugging Face
The Wav2Vec2 model was proposed in wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations by Alexei Baevski, Henry Zhou, Abdelrahman ...
Read more >
Dropout Regularization in Deep Learning Models with Keras
Dropout is a regularization technique for neural network models proposed by Srivastava et ... This is how Dropout is implemented in Keras.
Read more >
PyTorch vs TensorFlow — spotting the difference
As you can see, implementation in TensorFlow works too (surprisingly ). ... In TensorFlow you define graph statically before a model can run....
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found