Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Unimplemented error when using AdamWeightDecay in TF

See original GitHub issue

System Info

transformers version: 4.26.0.dev0
Platform: Linux-4.15.0-200-generic-x86_64-with-glibc2.17
Python version: 3.8.13
Huggingface_hub version: 0.11.1
PyTorch version (GPU?): 1.10.1+cu102 (True)
Tensorflow version (GPU?): 2.11.0 (True)
Flax version (CPU?/GPU?/TPU?): not installed (NA)
Jax version: not installed
JaxLib version: not installed
Using GPU in script?: Yes
Using distributed or parallel set-up in script?: No

Who can help?

@Rocketknight1

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, …)
My own task or dataset (give details below)

Reproduction

Coming from here: #20750. Using the example code but with AdamWeightDecay triggers the error.

The code:

from transformers import TFAutoModelForSequenceClassification
from transformers.optimization_tf import create_optimizer
from transformers import AutoTokenizer
from tensorflow.keras.optimizers import Adam
from datasets import load_dataset
import tensorflow as tf
import numpy as np

dataset = load_dataset("glue", "cola")
dataset = dataset["train"]  # Just take the training split for now


tokenizer = AutoTokenizer.from_pretrained("bert-base-cased")
tokenized_data = dict(tokenizer(dataset["sentence"], return_tensors="np", padding=True))

labels = np.array(dataset["label"])  # Label is already an array of 0 and 1

# Load and compile our model
model = TFAutoModelForSequenceClassification.from_pretrained("bert-base-cased")
# Lower learning rates are often better for fine-tuning transformers
optimizer, _ = create_optimizer(3e-5, 600, 100, weight_decay_rate=0.3)
model.compile(optimizer=optimizer, loss='binary_crossentropy')

model.fit(tokenized_data, labels)

Traceback (most recent call last):
  File "../test_mirrored.py", line 24, in <module>
    model.fit(tokenized_data, labels)
  File "/home/user/bicleaner-ai-trainings/venv/lib/python3.8/site-packages/keras/utils/traceback_utils.py", line 70, in error_handler
    raise e.with_traceback(filtered_tb) from None
  File "/home/user/bicleaner-ai-trainings/venv/lib/python3.8/site-packages/tensorflow/python/eager/execute.py", line 52, in quick_execute
    tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
tensorflow.python.framework.errors_impl.UnimplementedError: Graph execution error:

Detected at node 'Cast_1' defined at (most recent call last):
    File "../test_mirrored.py", line 24, in <module>
      model.fit(tokenized_data, labels)
    File "/home/user/bicleaner-ai-trainings/venv/lib/python3.8/site-packages/keras/utils/traceback_utils.py", line 65, in error_handler
      return fn(*args, **kwargs)
    File "/home/user/bicleaner-ai-trainings/venv/lib/python3.8/site-packages/keras/engine/training.py", line 1650, in fit
      tmp_logs = self.train_function(iterator)
    File "/home/user/bicleaner-ai-trainings/venv/lib/python3.8/site-packages/keras/engine/training.py", line 1249, in train_function
      return step_function(self, iterator)
    File "/home/user/bicleaner-ai-trainings/venv/lib/python3.8/site-packages/keras/engine/training.py", line 1233, in step_function
      outputs = model.distribute_strategy.run(run_step, args=(data,))
    File "/home/user/bicleaner-ai-trainings/venv/lib/python3.8/site-packages/keras/engine/training.py", line 1222, in run_step
      outputs = model.train_step(data)
    File "/home/user/bicleaner-ai-trainings/venv/lib/python3.8/site-packages/transformers/modeling_tf_utils.py", line 1559, in train_step
      self.optimizer.minimize(loss, self.trainable_variables, tape=tape)
    File "/home/user/bicleaner-ai-trainings/venv/lib/python3.8/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 527, in minimize
      self.apply_gradients(grads_and_vars)
    File "/home/user/bicleaner-ai-trainings/venv/lib/python3.8/site-packages/transformers/optimization_tf.py", line 252, in apply_gradients
      return super(AdamWeightDecay, self).apply_gradients(zip(grads, tvars), name=name, **kwargs)
    File "/home/user/bicleaner-ai-trainings/venv/lib/python3.8/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 1140, in apply_gradients
      return super().apply_gradients(grads_and_vars, name=name)
    File "/home/user/bicleaner-ai-trainings/venv/lib/python3.8/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 632, in apply_gradients
      self._apply_weight_decay(trainable_variables)
    File "/home/user/bicleaner-ai-trainings/venv/lib/python3.8/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 1159, in _apply_weight_decay
      tf.__internal__.distribute.interim.maybe_merge_call(
    File "/home/user/bicleaner-ai-trainings/venv/lib/python3.8/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 1155, in distributed_apply_weight_decay
      distribution.extended.update(
    File "/home/user/bicleaner-ai-trainings/venv/lib/python3.8/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 1151, in weight_decay_fn
      wd = tf.cast(self.weight_decay, variable.dtype)
Node: 'Cast_1'
2 root error(s) found.
  (0) UNIMPLEMENTED:  Cast string to float is not supported
         [[{{node Cast_1}}]]
  (1) CANCELLED:  Function was cancelled before it was started
0 successful operations.
0 derived errors ignored. [Op:__inference_train_function_37329]

Setting weight decay to 0.0 does not trigger the error, so I imagine its something with AdamWeightDecay. TensorFlow changelog says:

The tf.keras.optimizers.Optimizer base class now points to the new Keras optimizer, while the old optimizers have been moved to the tf.keras.optimizers.legacy namespace.

and

Checkpoint loading failure. The new optimizer handles optimizer state differently from the old optimizer, which simplifies the logic of checkpoint saving/loading, but at the cost of breaking checkpoint backward compatibility in some cases. If you want to keep using an old checkpoint, please change your optimizer to tf.keras.optimizer.legacy.XXX (e.g. tf.keras.optimizer.legacy.Adam). Old optimizer API not found. The new optimizer, tf.keras.optimizers.Optimizer, has a different set of public APIs from the old optimizer. These API changes are mostly related to getting rid of slot variables and TF1 support. Please check the API documentation to find alternatives to the missing API. If you must call the deprecated API, please change your optimizer to the legacy optimizer.

Could it be related to this?

Expected behavior

Train successfully.

Issue Analytics

State:
Created 9 months ago
Comments:5 (4 by maintainers)

Top GitHub Comments

1reaction

Rocketknight1commented, Dec 20, 2022

Hi @ZJaume, we saw this issue earlier but thought we had fixed it with #20735. I’ll investigate now and see if I can reproduce it

0reactions

ZJaumecommented, Dec 20, 2022

Working. Thank you!

Top Results From Across the Web

TF AdamWeightDecay fix for 2.11 #20848 - GitHub

The TF changelog said that the optimizer had been moved to tf.keras.optimizer.legacy, ... Unimplemented error when using AdamWeightDecay in TF #20847.

tf.errors.UnimplementedError | TensorFlow v2.11.0

Some operations may raise this error when passed otherwise-valid arguments that it does not currently support. For example, running the tf.nn.

How to solve UnimplementedError on tensorflow model?

Happened with me when I wanted to read .csv file and I read it and converted it to str, but it was solved...

Trainer — transformers 4.2.0 documentation - Hugging Face

Here is an example of how to customize Trainer using a custom loss function: ... If the callback is not found, returns None...

tensorflow-metal | Apple Developer Forums

Here is the thing, I made a fine-tuning BERT model with TF and TF-Hub only. And I got the same error as before....