Unimplemented error when using AdamWeightDecay in TF
See original GitHub issueSystem Info
transformers
version: 4.26.0.dev0- Platform: Linux-4.15.0-200-generic-x86_64-with-glibc2.17
- Python version: 3.8.13
- Huggingface_hub version: 0.11.1
- PyTorch version (GPU?): 1.10.1+cu102 (True)
- Tensorflow version (GPU?): 2.11.0 (True)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Using GPU in script?: Yes
- Using distributed or parallel set-up in script?: No
Who can help?
Information
- The official example scripts
- My own modified scripts
Tasks
- An officially supported task in the
examples
folder (such as GLUE/SQuAD, …) - My own task or dataset (give details below)
Reproduction
Coming from here: #20750. Using the example code but with AdamWeightDecay triggers the error.
The code:
from transformers import TFAutoModelForSequenceClassification
from transformers.optimization_tf import create_optimizer
from transformers import AutoTokenizer
from tensorflow.keras.optimizers import Adam
from datasets import load_dataset
import tensorflow as tf
import numpy as np
dataset = load_dataset("glue", "cola")
dataset = dataset["train"] # Just take the training split for now
tokenizer = AutoTokenizer.from_pretrained("bert-base-cased")
tokenized_data = dict(tokenizer(dataset["sentence"], return_tensors="np", padding=True))
labels = np.array(dataset["label"]) # Label is already an array of 0 and 1
# Load and compile our model
model = TFAutoModelForSequenceClassification.from_pretrained("bert-base-cased")
# Lower learning rates are often better for fine-tuning transformers
optimizer, _ = create_optimizer(3e-5, 600, 100, weight_decay_rate=0.3)
model.compile(optimizer=optimizer, loss='binary_crossentropy')
model.fit(tokenized_data, labels)
Traceback (most recent call last):
File "../test_mirrored.py", line 24, in <module>
model.fit(tokenized_data, labels)
File "/home/user/bicleaner-ai-trainings/venv/lib/python3.8/site-packages/keras/utils/traceback_utils.py", line 70, in error_handler
raise e.with_traceback(filtered_tb) from None
File "/home/user/bicleaner-ai-trainings/venv/lib/python3.8/site-packages/tensorflow/python/eager/execute.py", line 52, in quick_execute
tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
tensorflow.python.framework.errors_impl.UnimplementedError: Graph execution error:
Detected at node 'Cast_1' defined at (most recent call last):
File "../test_mirrored.py", line 24, in <module>
model.fit(tokenized_data, labels)
File "/home/user/bicleaner-ai-trainings/venv/lib/python3.8/site-packages/keras/utils/traceback_utils.py", line 65, in error_handler
return fn(*args, **kwargs)
File "/home/user/bicleaner-ai-trainings/venv/lib/python3.8/site-packages/keras/engine/training.py", line 1650, in fit
tmp_logs = self.train_function(iterator)
File "/home/user/bicleaner-ai-trainings/venv/lib/python3.8/site-packages/keras/engine/training.py", line 1249, in train_function
return step_function(self, iterator)
File "/home/user/bicleaner-ai-trainings/venv/lib/python3.8/site-packages/keras/engine/training.py", line 1233, in step_function
outputs = model.distribute_strategy.run(run_step, args=(data,))
File "/home/user/bicleaner-ai-trainings/venv/lib/python3.8/site-packages/keras/engine/training.py", line 1222, in run_step
outputs = model.train_step(data)
File "/home/user/bicleaner-ai-trainings/venv/lib/python3.8/site-packages/transformers/modeling_tf_utils.py", line 1559, in train_step
self.optimizer.minimize(loss, self.trainable_variables, tape=tape)
File "/home/user/bicleaner-ai-trainings/venv/lib/python3.8/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 527, in minimize
self.apply_gradients(grads_and_vars)
File "/home/user/bicleaner-ai-trainings/venv/lib/python3.8/site-packages/transformers/optimization_tf.py", line 252, in apply_gradients
return super(AdamWeightDecay, self).apply_gradients(zip(grads, tvars), name=name, **kwargs)
File "/home/user/bicleaner-ai-trainings/venv/lib/python3.8/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 1140, in apply_gradients
return super().apply_gradients(grads_and_vars, name=name)
File "/home/user/bicleaner-ai-trainings/venv/lib/python3.8/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 632, in apply_gradients
self._apply_weight_decay(trainable_variables)
File "/home/user/bicleaner-ai-trainings/venv/lib/python3.8/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 1159, in _apply_weight_decay
tf.__internal__.distribute.interim.maybe_merge_call(
File "/home/user/bicleaner-ai-trainings/venv/lib/python3.8/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 1155, in distributed_apply_weight_decay
distribution.extended.update(
File "/home/user/bicleaner-ai-trainings/venv/lib/python3.8/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 1151, in weight_decay_fn
wd = tf.cast(self.weight_decay, variable.dtype)
Node: 'Cast_1'
2 root error(s) found.
(0) UNIMPLEMENTED: Cast string to float is not supported
[[{{node Cast_1}}]]
(1) CANCELLED: Function was cancelled before it was started
0 successful operations.
0 derived errors ignored. [Op:__inference_train_function_37329]
Setting weight decay to 0.0 does not trigger the error, so I imagine its something with AdamWeightDecay. TensorFlow changelog says:
The tf.keras.optimizers.Optimizer base class now points to the new Keras optimizer, while the old optimizers have been moved to the tf.keras.optimizers.legacy namespace.
and
Checkpoint loading failure. The new optimizer handles optimizer state differently from the old optimizer, which simplifies the logic of checkpoint saving/loading, but at the cost of breaking checkpoint backward compatibility in some cases. If you want to keep using an old checkpoint, please change your optimizer to tf.keras.optimizer.legacy.XXX (e.g. tf.keras.optimizer.legacy.Adam). Old optimizer API not found. The new optimizer, tf.keras.optimizers.Optimizer, has a different set of public APIs from the old optimizer. These API changes are mostly related to getting rid of slot variables and TF1 support. Please check the API documentation to find alternatives to the missing API. If you must call the deprecated API, please change your optimizer to the legacy optimizer.
Could it be related to this?
Expected behavior
Train successfully.
Issue Analytics
- State:
- Created 9 months ago
- Comments:5 (4 by maintainers)
Top GitHub Comments
Hi @ZJaume, we saw this issue earlier but thought we had fixed it with #20735. I’ll investigate now and see if I can reproduce it
Working. Thank you!