question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Unimplemented error when using AdamWeightDecay in TF

See original GitHub issue

System Info

  • transformers version: 4.26.0.dev0
  • Platform: Linux-4.15.0-200-generic-x86_64-with-glibc2.17
  • Python version: 3.8.13
  • Huggingface_hub version: 0.11.1
  • PyTorch version (GPU?): 1.10.1+cu102 (True)
  • Tensorflow version (GPU?): 2.11.0 (True)
  • Flax version (CPU?/GPU?/TPU?): not installed (NA)
  • Jax version: not installed
  • JaxLib version: not installed
  • Using GPU in script?: Yes
  • Using distributed or parallel set-up in script?: No

Who can help?

@Rocketknight1

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, …)
  • My own task or dataset (give details below)

Reproduction

Coming from here: #20750. Using the example code but with AdamWeightDecay triggers the error.

The code:

from transformers import TFAutoModelForSequenceClassification
from transformers.optimization_tf import create_optimizer
from transformers import AutoTokenizer
from tensorflow.keras.optimizers import Adam
from datasets import load_dataset
import tensorflow as tf
import numpy as np

dataset = load_dataset("glue", "cola")
dataset = dataset["train"]  # Just take the training split for now


tokenizer = AutoTokenizer.from_pretrained("bert-base-cased")
tokenized_data = dict(tokenizer(dataset["sentence"], return_tensors="np", padding=True))

labels = np.array(dataset["label"])  # Label is already an array of 0 and 1

# Load and compile our model
model = TFAutoModelForSequenceClassification.from_pretrained("bert-base-cased")
# Lower learning rates are often better for fine-tuning transformers
optimizer, _ = create_optimizer(3e-5, 600, 100, weight_decay_rate=0.3)
model.compile(optimizer=optimizer, loss='binary_crossentropy')

model.fit(tokenized_data, labels)
Traceback (most recent call last):
  File "../test_mirrored.py", line 24, in <module>
    model.fit(tokenized_data, labels)
  File "/home/user/bicleaner-ai-trainings/venv/lib/python3.8/site-packages/keras/utils/traceback_utils.py", line 70, in error_handler
    raise e.with_traceback(filtered_tb) from None
  File "/home/user/bicleaner-ai-trainings/venv/lib/python3.8/site-packages/tensorflow/python/eager/execute.py", line 52, in quick_execute
    tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
tensorflow.python.framework.errors_impl.UnimplementedError: Graph execution error:

Detected at node 'Cast_1' defined at (most recent call last):
    File "../test_mirrored.py", line 24, in <module>
      model.fit(tokenized_data, labels)
    File "/home/user/bicleaner-ai-trainings/venv/lib/python3.8/site-packages/keras/utils/traceback_utils.py", line 65, in error_handler
      return fn(*args, **kwargs)
    File "/home/user/bicleaner-ai-trainings/venv/lib/python3.8/site-packages/keras/engine/training.py", line 1650, in fit
      tmp_logs = self.train_function(iterator)
    File "/home/user/bicleaner-ai-trainings/venv/lib/python3.8/site-packages/keras/engine/training.py", line 1249, in train_function
      return step_function(self, iterator)
    File "/home/user/bicleaner-ai-trainings/venv/lib/python3.8/site-packages/keras/engine/training.py", line 1233, in step_function
      outputs = model.distribute_strategy.run(run_step, args=(data,))
    File "/home/user/bicleaner-ai-trainings/venv/lib/python3.8/site-packages/keras/engine/training.py", line 1222, in run_step
      outputs = model.train_step(data)
    File "/home/user/bicleaner-ai-trainings/venv/lib/python3.8/site-packages/transformers/modeling_tf_utils.py", line 1559, in train_step
      self.optimizer.minimize(loss, self.trainable_variables, tape=tape)
    File "/home/user/bicleaner-ai-trainings/venv/lib/python3.8/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 527, in minimize
      self.apply_gradients(grads_and_vars)
    File "/home/user/bicleaner-ai-trainings/venv/lib/python3.8/site-packages/transformers/optimization_tf.py", line 252, in apply_gradients
      return super(AdamWeightDecay, self).apply_gradients(zip(grads, tvars), name=name, **kwargs)
    File "/home/user/bicleaner-ai-trainings/venv/lib/python3.8/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 1140, in apply_gradients
      return super().apply_gradients(grads_and_vars, name=name)
    File "/home/user/bicleaner-ai-trainings/venv/lib/python3.8/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 632, in apply_gradients
      self._apply_weight_decay(trainable_variables)
    File "/home/user/bicleaner-ai-trainings/venv/lib/python3.8/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 1159, in _apply_weight_decay
      tf.__internal__.distribute.interim.maybe_merge_call(
    File "/home/user/bicleaner-ai-trainings/venv/lib/python3.8/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 1155, in distributed_apply_weight_decay
      distribution.extended.update(
    File "/home/user/bicleaner-ai-trainings/venv/lib/python3.8/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 1151, in weight_decay_fn
      wd = tf.cast(self.weight_decay, variable.dtype)
Node: 'Cast_1'
2 root error(s) found.
  (0) UNIMPLEMENTED:  Cast string to float is not supported
         [[{{node Cast_1}}]]
  (1) CANCELLED:  Function was cancelled before it was started
0 successful operations.
0 derived errors ignored. [Op:__inference_train_function_37329]

Setting weight decay to 0.0 does not trigger the error, so I imagine its something with AdamWeightDecay. TensorFlow changelog says:

The tf.keras.optimizers.Optimizer base class now points to the new Keras optimizer, while the old optimizers have been moved to the tf.keras.optimizers.legacy namespace.

and

Checkpoint loading failure. The new optimizer handles optimizer state differently from the old optimizer, which simplifies the logic of checkpoint saving/loading, but at the cost of breaking checkpoint backward compatibility in some cases. If you want to keep using an old checkpoint, please change your optimizer to tf.keras.optimizer.legacy.XXX (e.g. tf.keras.optimizer.legacy.Adam). Old optimizer API not found. The new optimizer, tf.keras.optimizers.Optimizer, has a different set of public APIs from the old optimizer. These API changes are mostly related to getting rid of slot variables and TF1 support. Please check the API documentation to find alternatives to the missing API. If you must call the deprecated API, please change your optimizer to the legacy optimizer.

Could it be related to this?

Expected behavior

Train successfully.

Issue Analytics

  • State:closed
  • Created 9 months ago
  • Comments:5 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
Rocketknight1commented, Dec 20, 2022

Hi @ZJaume, we saw this issue earlier but thought we had fixed it with #20735. I’ll investigate now and see if I can reproduce it

0reactions
ZJaumecommented, Dec 20, 2022

Working. Thank you!

Read more comments on GitHub >

github_iconTop Results From Across the Web

TF AdamWeightDecay fix for 2.11 #20848 - GitHub
The TF changelog said that the optimizer had been moved to tf.keras.optimizer.legacy, ... Unimplemented error when using AdamWeightDecay in TF #20847.
Read more >
tf.errors.UnimplementedError | TensorFlow v2.11.0
Some operations may raise this error when passed otherwise-valid arguments that it does not currently support. For example, running the tf.nn.
Read more >
How to solve UnimplementedError on tensorflow model?
Happened with me when I wanted to read .csv file and I read it and converted it to str, but it was solved...
Read more >
Trainer — transformers 4.2.0 documentation - Hugging Face
Here is an example of how to customize Trainer using a custom loss function: ... If the callback is not found, returns None...
Read more >
tensorflow-metal | Apple Developer Forums
Here is the thing, I made a fine-tuning BERT model with TF and TF-Hub only. And I got the same error as before....
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found