Issue in layer-drop implementation in TensorFlow models in graph mode
See original GitHub issueEnvironment info
transformers
version: 4.8.1- Platform: Linux-5.4.104±x86_64-with-Ubuntu-18.04-bionic
- Python version: 3.7.10
- PyTorch version (GPU?): 1.9.0+cu102 (False)
- Tensorflow version (GPU?): 2.5.0 (False)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Using GPU in script?: Yes
- Using distributed or parallel set-up in script?: No
Who can help
Information
Model I am using: TFBartForConditionalGeneration
The problem arises when using:
- the official example scripts: (give details below)
- my own modified scripts: (give details below)
The tasks I am working on is:
- an official GLUE/SQUaD task: (give the name)
- my own task or dataset: (give details below)
To reproduce
from transformers import TFBartForConditionalGeneration, BartConfig
# keeping layerdrop to be very high value for demonstration error
model = TFBartForConditionalGeneration(BartConfig(encoder_layerdrop=0.5))
import tensorflow as tf
import numpy as np
array = np.random.randint(1, 300, size=(4, 256))
dataset = tf.constant(array, dtype=tf.int32)
# following cell works perfectly when `tf.function(...)` is removed
@tf.function
def train_step(tensor):
return model(tensor, training=True)
from tqdm.auto import tqdm
for tensor in tqdm(dataset, total=len(dataset)):
tensor = tf.expand_dims(tensor, 0)
output = train_step(tensor)
You can checkout this small Colab notebook also for reproducing the error.
ValueError: in user code:
<ipython-input-5-ca2e97b30313>:4 train_step *
return model(tensor, training=True)
/usr/local/lib/python3.7/dist-packages/transformers/models/bart/modeling_tf_bart.py:1393 call *
outputs = self.model(
/usr/local/lib/python3.7/dist-packages/transformers/models/bart/modeling_tf_bart.py:1125 call *
inputs["encoder_outputs"] = self.encoder(
/usr/local/lib/python3.7/dist-packages/transformers/models/bart/modeling_tf_bart.py:764 call *
hidden_states, attn = encoder_layer(
/usr/local/lib/python3.7/dist-packages/transformers/models/bart/modeling_tf_bart.py:305 call *
hidden_states, self_attn_weights, _ = self.self_attn(
/usr/local/lib/python3.7/dist-packages/transformers/models/bart/modeling_tf_bart.py:178 call *
query_states = self.q_proj(hidden_states) * self.scaling
/usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/engine/base_layer.py:1023 __call__ **
self._maybe_build(inputs)
/usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/engine/base_layer.py:2625 _maybe_build
self.build(input_shapes) # pylint:disable=not-callable
/usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/layers/core.py:1198 build
trainable=True)
/usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/engine/base_layer.py:655 add_weight
caching_device=caching_device)
/usr/local/lib/python3.7/dist-packages/tensorflow/python/training/tracking/base.py:815 _add_variable_with_custom_getter
**kwargs_for_getter)
/usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/engine/base_layer_utils.py:139 make_variable
shape=variable_shape if variable_shape else None)
/usr/local/lib/python3.7/dist-packages/tensorflow/python/ops/variables.py:260 __call__
return cls._variable_v1_call(*args, **kwargs)
/usr/local/lib/python3.7/dist-packages/tensorflow/python/ops/variables.py:221 _variable_v1_call
shape=shape)
/usr/local/lib/python3.7/dist-packages/tensorflow/python/ops/variables.py:67 getter
return captured_getter(captured_previous, **kwargs)
/usr/local/lib/python3.7/dist-packages/tensorflow/python/eager/def_function.py:769 invalid_creator_scope
"tf.function-decorated function tried to create "
ValueError: tf.function-decorated function tried to create variables on non-first call.
Side note: I have checked this same thing for TFWav2Vec2 also, but same issue is happening. So, possibly all TF model using layer-drop needs to be fixed.
Expected behavior
layer drop should work perfectly in graph mode.
Issue Analytics
- State:
- Created 2 years ago
- Comments:8 (7 by maintainers)
Top Results From Across the Web
How to implement LayerDrop in TensorFlow Transformers
I need to implement layer drop in TensorFlow Transformer. ... Do you get an explicit error in graph mode (tf.function)?.
Read more >[TFBART] LayerDrop not working on TPU · Issue #9048 - GitHub
The parameter return_dict cannot be set in graph mode and will always be set to True . The parameters output_attentions , output_hidden_states ...
Read more >Wav2Vec2 - Hugging Face
The Wav2Vec2 model was proposed in wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations by Alexei Baevski, Henry Zhou, Abdelrahman ...
Read more >Dropout Regularization in Deep Learning Models with Keras
Dropout is a regularization technique for neural network models proposed by Srivastava et ... This is how Dropout is implemented in Keras.
Read more >PyTorch vs TensorFlow — spotting the difference
As you can see, implementation in TensorFlow works too (surprisingly ). ... In TensorFlow you define graph statically before a model can run....
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Cool! We’ll hold off on disabling it for now - if you find a solution, let us know, and don’t panic if it turns out to be impossible - just say so and we’ll close this issue and disable layerdrop in graph mode instead. Thanks for your help!
On investigation, I’m pretty sure the issue is caused by the way we’re doing layerdrop: https://github.com/huggingface/transformers/blob/master/src/transformers/models/bart/modeling_tf_bart.py#L755-L772
This code is correct for eager execution, but I suspect in graph mode that this leads to the creation of new variables and graph edges whenever a layer is skipped for the first time. I can see some workarounds, but unfortunately no perfect ones - this seems like a fundamental limitation of the way graph mode works in TF.
You’re welcome to investigate and try to find a solution if you like, but we’re probably just going to explicitly disable layer drop in graph mode for now.