Tensorflow 2 Finetuning TF T5 using keras fit
See original GitHub issueThe Problem
I have been trying to finetune the T5 model model using tensorflow and keras. there is no documentation/ offcial+community or notebook for finetuning T5 in tensorflow. There are a bunch of lines here and some finetuning instructions here other than that there is nothing for tensorflow.
Environment info
transformers
version:3.0.2
- Platform:
Linux-4.15.0-112-generic-x86_64-with-debian-buster-sid
- Python version:
Linux-4.15.0-112-generic-x86_64-with-debian-buster-sid
- PyTorch version (GPU?):
1.6.0 (True)
- Tensorflow version (GPU?):
2.2.0 (True)
- Using GPU in script?:
yes
- Using distributed or parallel set-up in script?:
no
Who can help
Information
Model I am using (Bert, XLNet …): TFT5ForConditionalGeneration (TFAutoModelWithLMHead) pretrained
The problem arises when using:
- the official example scripts: (give details below)
- my own modified scripts: (give details below)
The tasks I am working on is:
- an official GLUE/SQUaD task: (SQuad from tfds)
- my own task or dataset: (give details below)
To reproduce
model = TFAutoModelWithLMHead.from_pretrained("t5-base")
tokenizer = AutoTokenizer.from_pretrained("t5-base")
train_dataset, info = tfds.load('squad', split='train', with_info=True)
def encode_tf(inputs):
"""Encodes the squad inputs and uses the tokenizer to encode inputs returns
the appropriate model 'input_ids', 'attention masks`, `decoder_attention_mask`, 'labels'
Returns:
dict: returns a dictionary with keys: 'input_ids', 'attention masks`, `decoder_attention_mask`,
'labels' with appropriate tensor values
"""
pass
dataset = train_dataset.map(encode_tf)
dataset = dataset.shuffle(1000)
dataset = dataset.batch(8)
Sample data output:
data = next(iter(dataset))
data
{'input_ids': <tf.Tensor: shape=(8, 200), dtype=int32, numpy=
array([[ 987, 834, 7771, ..., 0, 0, 0],
[ 987, 834, 7771, ..., 2749, 3385, 12187],
[ 987, 834, 7771, ..., 0, 0, 0],
...,
[ 987, 834, 7771, ..., 0, 0, 0],
[ 987, 834, 7771, ..., 0, 0, 0],
[ 987, 834, 7771, ..., 6, 30, 8]], dtype=int32)>,
'labels': <tf.Tensor: shape=(8, 200), dtype=int32, numpy=
array([[ 363, 19, 80, ..., 0, 0, 0],
[4504, 149, 186, ..., 0, 0, 0],
[ 571, 54, 3298, ..., 0, 0, 0],
...,
[2645, 2832, 4599, ..., 0, 0, 0],
[ 571, 103, 7000, ..., 0, 0, 0],
[ 366, 410, 8, ..., 0, 0, 0]], dtype=int32)>,
'attention_mask': <tf.Tensor: shape=(8, 200), dtype=int32, numpy=
array([[1, 1, 1, ..., 0, 0, 0],
[1, 1, 1, ..., 1, 1, 1],
[1, 1, 1, ..., 0, 0, 0],
...,
[1, 1, 1, ..., 0, 0, 0],
[1, 1, 1, ..., 0, 0, 0],
[1, 1, 1, ..., 1, 1, 1]], dtype=int32)>,
'decoder_attention_mask': <tf.Tensor: shape=(8, 200), dtype=int32, numpy=
array([[1, 1, 1, ..., 0, 0, 0],
[1, 1, 1, ..., 0, 0, 0],
[1, 1, 1, ..., 0, 0, 0],
...,
[1, 1, 1, ..., 0, 0, 0],
[1, 1, 1, ..., 0, 0, 0],
[1, 1, 1, ..., 0, 0, 0]], dtype=int32)>}
Training
optimizer = tf.keras.optimizers.Adam(learning_rate=3e-5)
loss = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
model.compile(optimizer=optimizer, loss=loss)
model.fit(dataset, epochs=10)
model.fit result in the following error about ValueError: No gradients provided for any variable
The Stacktrace
Epoch 1/10
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-163-f8c5e0c71664> in <module>
----> 1 model.fit(dataset, epochs=10)
~/anaconda3/envs/hugging/lib/python3.7/site-packages/tensorflow/python/keras/engine/training.py in _method_wrapper(self, *args, **kwargs)
64 def _method_wrapper(self, *args, **kwargs):
65 if not self._in_multi_worker_mode(): # pylint: disable=protected-access
---> 66 return method(self, *args, **kwargs)
67
68 # Running inside `run_distribute_coordinator` already.
~/anaconda3/envs/hugging/lib/python3.7/site-packages/tensorflow/python/keras/engine/training.py in fit(self, x, y, batch_size, epochs, verbose, callbacks, validation_split, validation_data, shuffle, class_weight, sample_weight, initial_epoch, steps_per_epoch, validation_steps, validation_batch_size, validation_freq, max_queue_size, workers, use_multiprocessing)
846 batch_size=batch_size):
847 callbacks.on_train_batch_begin(step)
--> 848 tmp_logs = train_function(iterator)
849 # Catch OutOfRangeError for Datasets of unknown size.
850 # This blocks until the batch has finished executing.
~/anaconda3/envs/hugging/lib/python3.7/site-packages/tensorflow/python/eager/def_function.py in __call__(self, *args, **kwds)
578 xla_context.Exit()
579 else:
--> 580 result = self._call(*args, **kwds)
581
582 if tracing_count == self._get_tracing_count():
~/anaconda3/envs/hugging/lib/python3.7/site-packages/tensorflow/python/eager/def_function.py in _call(self, *args, **kwds)
625 # This is the first call of __call__, so we have to initialize.
626 initializers = []
--> 627 self._initialize(args, kwds, add_initializers_to=initializers)
628 finally:
629 # At this point we know that the initialization is complete (or less
~/anaconda3/envs/hugging/lib/python3.7/site-packages/tensorflow/python/eager/def_function.py in _initialize(self, args, kwds, add_initializers_to)
504 self._concrete_stateful_fn = (
505 self._stateful_fn._get_concrete_function_internal_garbage_collected( # pylint: disable=protected-access
--> 506 *args, **kwds))
507
508 def invalid_creator_scope(*unused_args, **unused_kwds):
~/anaconda3/envs/hugging/lib/python3.7/site-packages/tensorflow/python/eager/function.py in _get_concrete_function_internal_garbage_collected(self, *args, **kwargs)
2444 args, kwargs = None, None
2445 with self._lock:
-> 2446 graph_function, _, _ = self._maybe_define_function(args, kwargs)
2447 return graph_function
2448
~/anaconda3/envs/hugging/lib/python3.7/site-packages/tensorflow/python/eager/function.py in _maybe_define_function(self, args, kwargs)
2775
2776 self._function_cache.missed.add(call_context_key)
-> 2777 graph_function = self._create_graph_function(args, kwargs)
2778 self._function_cache.primary[cache_key] = graph_function
2779 return graph_function, args, kwargs
~/anaconda3/envs/hugging/lib/python3.7/site-packages/tensorflow/python/eager/function.py in _create_graph_function(self, args, kwargs, override_flat_arg_shapes)
2665 arg_names=arg_names,
2666 override_flat_arg_shapes=override_flat_arg_shapes,
-> 2667 capture_by_value=self._capture_by_value),
2668 self._function_attributes,
2669 # Tell the ConcreteFunction to clean up its graph once it goes out of
~/anaconda3/envs/hugging/lib/python3.7/site-packages/tensorflow/python/framework/func_graph.py in func_graph_from_py_func(name, python_func, args, kwargs, signature, func_graph, autograph, autograph_options, add_control_dependencies, arg_names, op_return_value, collections, capture_by_value, override_flat_arg_shapes)
979 _, original_func = tf_decorator.unwrap(python_func)
980
--> 981 func_outputs = python_func(*func_args, **func_kwargs)
982
983 # invariant: `func_outputs` contains only Tensors, CompositeTensors,
~/anaconda3/envs/hugging/lib/python3.7/site-packages/tensorflow/python/eager/def_function.py in wrapped_fn(*args, **kwds)
439 # __wrapped__ allows AutoGraph to swap in a converted function. We give
440 # the function a weak reference to itself to avoid a reference cycle.
--> 441 return weak_wrapped_fn().__wrapped__(*args, **kwds)
442 weak_wrapped_fn = weakref.ref(wrapped_fn)
443
~/anaconda3/envs/hugging/lib/python3.7/site-packages/tensorflow/python/framework/func_graph.py in wrapper(*args, **kwargs)
966 except Exception as e: # pylint:disable=broad-except
967 if hasattr(e, "ag_error_metadata"):
--> 968 raise e.ag_error_metadata.to_exception(e)
969 else:
970 raise
ValueError: in user code:
/home/ml/anaconda3/envs/hugging/lib/python3.7/site-packages/tensorflow/python/keras/engine/training.py:571 train_function *
outputs = self.distribute_strategy.run(
/home/ml/anaconda3/envs/hugging/lib/python3.7/site-packages/tensorflow/python/distribute/distribute_lib.py:951 run **
return self._extended.call_for_each_replica(fn, args=args, kwargs=kwargs)
/home/ml/anaconda3/envs/hugging/lib/python3.7/site-packages/tensorflow/python/distribute/distribute_lib.py:2290 call_for_each_replica
return self._call_for_each_replica(fn, args, kwargs)
/home/ml/anaconda3/envs/hugging/lib/python3.7/site-packages/tensorflow/python/distribute/distribute_lib.py:2649 _call_for_each_replica
return fn(*args, **kwargs)
/home/ml/anaconda3/envs/hugging/lib/python3.7/site-packages/tensorflow/python/keras/engine/training.py:541 train_step **
self.trainable_variables)
/home/ml/anaconda3/envs/hugging/lib/python3.7/site-packages/tensorflow/python/keras/engine/training.py:1804 _minimize
trainable_variables))
/home/ml/anaconda3/envs/hugging/lib/python3.7/site-packages/tensorflow/python/keras/optimizer_v2/optimizer_v2.py:521 _aggregate_gradients
filtered_grads_and_vars = _filter_grads(grads_and_vars)
/home/ml/anaconda3/envs/hugging/lib/python3.7/site-packages/tensorflow/python/keras/optimizer_v2/optimizer_v2.py:1219 _filter_grads
([v.name for _, v in grads_and_vars],))
ValueError: No gradients provided for any variable: ['shared/shared/weight:0', 'tf_t5for_conditional_generation/encoder/block_._0/layer_._0/SelfAttention/q/kernel:0', 'tf_t5for_conditional_generation/encoder/block_._0/layer_._0/SelfAttention/k/kernel:0', 'tf_t5for_conditional_generation/encoder/block_._0/layer_._0/SelfAttention/v/kernel:0', 'tf_t5for_conditional_generation/encoder/block_._0/layer_._0/SelfAttention/o/kernel:0', 'tf_t5for_conditional_generation/encoder/block_._0/layer_._0/SelfAttention/relative_attention_bias/embeddings:0', 'tf_t5for_conditional_generation/encoder/block_._0/layer_._0/layer_norm/weight:0', 'tf_t5for_conditional_generation/encoder/block_._0/layer_._1/DenseReluDense/wi/kernel:0', 'tf_t5for_conditional_generation/encoder/block_._0/layer_._1/DenseReluDense/wo/kernel:0', 'tf_t5for_conditional_generation/encoder/block_._0/layer_._1/layer_norm/weight:0', 'tf_t5for_conditional_generation/encoder/block_._1/layer_._0/SelfAttention/q/kernel:0', 'tf_t5for_conditional_generation/encoder/block_._1/layer_._0/SelfAttention/k/kernel:0', 'tf_t5for_conditional_generation/encoder/block_._1/layer_._0/SelfAttention/v/kernel:0', 'tf_t5for_conditional_generation/encoder/block_._1/layer_._0/SelfAttention/o/kernel:0', 'tf_t5for_conditional_generation/encoder/block_._1/layer_._0/layer_norm/weight:0', 'tf_t5for_conditional_generation/encoder/block_._1/layer_._1/DenseReluDense/wi/kernel:0', 'tf_t5for_conditional_generation/encoder/block_._1/layer_._1/DenseReluDense/wo/kernel:0', 'tf_t5for_conditional_generation/encoder/block_._1/layer_._1/layer_norm/weight:0', 'tf_t5for_conditional_generation/encoder/block_._2/layer_._0/SelfAttention/q/kernel:0', 'tf_t5for_conditional_generation/encoder/block_._2/layer_._0/SelfAttention/k/kernel:0', 'tf_t5for_conditional_generation/encoder/block_._2/layer_._0/SelfAttention/v/kernel:0', 'tf_t5for_conditional_generation/encoder/block_._2/layer_._0/SelfAttention/o/kernel:0', 'tf_t5for_conditional_generation/encoder/block_._2/layer_._0/layer_norm/weight:0', 'tf_t5for_conditional_generation/encoder/block_._2/layer_._1/DenseReluDense/wi/kernel:0', 'tf_t5for_conditional_generation/encoder/block_._2/layer_._1/DenseReluDense/wo/kernel:0', 'tf_t5for_conditional_generation/encoder/block_._2/layer_._1/layer_norm/weight:0', 'tf_t5for_conditional_generation/encoder/block_._3/layer_._0/SelfAttention/q/kernel:0', 'tf_t5for_conditional_generation/encoder/block_._3/layer_._0/SelfAttention/k/kernel:0', 'tf_t5for_conditional_generation/encoder/block_._3/layer_._0/SelfAttention/v/kernel:0', 'tf_t5for_conditional_generation/encoder/block_._3/layer_._0/SelfAttention/o/kernel:0', 'tf_t5for_conditional_generation/encoder/block_._3/layer_._0/layer_norm/weight:0', 'tf_t5for_conditional_generation/encoder/block_._3/layer_._1/DenseReluDense/wi/kernel:0', 'tf_t5for_conditional_generation/encoder/block_._3/layer_._1/DenseReluDense/wo/kernel:0', 'tf_t5for_conditional_generation/encoder/block_._3/layer_._1/layer_norm/weight:0', 'tf_t5for_conditional_generation/encoder/block_._4/layer_._0/SelfAttention/q/kernel:0', 'tf_t5for_conditional_generation/encoder/block_._4/layer_._0/SelfAttention/k/kernel:0', 'tf_t5for_conditional_generation/encoder/block_._4/layer_._0/SelfAttention/v/kernel:0', 'tf_t5for_conditional_generation/encoder/block_._4/layer_._0/SelfAttention/o/kernel:0', 'tf_t5for_conditional_generation/encoder/block_._4/layer_._0/layer_norm/weight:0', 'tf_t5for_conditional_generation/encoder/block_._4/layer_._1/DenseReluDense/wi/kernel:0', 'tf_t5for_conditional_generation/encoder/block_._4/layer_._1/DenseReluDense/wo/kernel:0', 'tf_t5for_conditional_generation/encoder/block_._4/layer_._1/layer_norm/weight:0', 'tf_t5for_conditional_generation/encoder/block_._5/layer_._0/SelfAttention/q/kernel:0', 'tf_t5for_conditional_generation/encoder/block_._5/layer_._0/SelfAttention/k/kernel:0', 'tf_t5for_conditional_generation/encoder/block_._5/layer_._0/SelfAttention/v/kernel:0', 'tf_t5for_conditional_generation/encoder/block_._5/layer_._0/SelfAttention/o/kernel:0', 'tf_t5for_conditional_generation/encoder/block_._5/layer_._0/layer_norm/weight:0', 'tf_t5for_conditional_generation/encoder/block_._5/layer_._1/DenseReluDense/wi/kernel:0', 'tf_t5for_conditional_generation/encoder/block_._5/layer_._1/DenseReluDense/wo/kernel:0', 'tf_t5for_conditional_generation/encoder/block_._5/layer_._1/layer_norm/weight:0', 'tf_t5for_conditional_generation/encoder/block_._6/layer_._0/SelfAttention/q/kernel:0', 'tf_t5for_conditional_generation/encoder/block_._6/layer_._0/SelfAttention/k/kernel:0', 'tf_t5for_conditional_generation/encoder/block_._6/layer_._0/SelfAttention/v/kernel:0', 'tf_t5for_conditional_generation/encoder/block_._6/layer_._0/SelfAttention/o/kernel:0', 'tf_t5for_conditional_generation/encoder/block_._6/layer_._0/layer_norm/weight:0', 'tf_t5for_conditional_generation/encoder/block_._6/layer_._1/DenseReluDense/wi/kernel:0', 'tf_t5for_conditional_generation/encoder/block_._6/layer_._1/DenseReluDense/wo/kernel:0', 'tf_t5for_conditional_generation/encoder/block_._6/layer_._1/layer_norm/weight:0', 'tf_t5for_conditional_generation/encoder/block_._7/layer_._0/SelfAttention/q/kernel:0', 'tf_t5for_conditional_generation/encoder/block_._7/layer_._0/SelfAttention/k/kernel:0', 'tf_t5for_conditional_generation/encoder/block_._7/layer_._0/SelfAttention/v/kernel:0', 'tf_t5for_conditional_generation/encoder/block_._7/layer_._0/SelfAttention/o/kernel:0', 'tf_t5for_conditional_generation/encoder/block_._7/layer_._0/layer_norm/weight:0', 'tf_t5for_conditional_generation/encoder/block_._7/layer_._1/DenseReluDense/wi/kernel:0', 'tf_t5for_conditional_generation/encoder/block_._7/layer_._1/DenseReluDense/wo/kernel:0', 'tf_t5for_conditional_generation/encoder/block_._7/layer_._1/layer_norm/weight:0', 'tf_t5for_conditional_generation/encoder/block_._8/layer_._0/SelfAttention/q/kernel:0', 'tf_t5for_conditional_generation/encoder/block_._8/layer_._0/SelfAttention/k/kernel:0', 'tf_t5for_conditional_generation/encoder/block_._8/layer_._0/SelfAttention/v/kernel:0', 'tf_t5for_conditional_generation/encoder/block_._8/layer_._0/SelfAttention/o/kernel:0', 'tf_t5for_conditional_generation/encoder/block_._8/layer_._0/layer_norm/weight:0', 'tf_t5for_conditional_generation/encoder/block_._8/layer_._1/DenseReluDense/wi/kernel:0', 'tf_t5for_conditional_generation/encoder/block_._8/layer_._1/DenseReluDense/wo/kernel:0', 'tf_t5for_conditional_generation/encoder/block_._8/layer_._1/layer_norm/weight:0', 'tf_t5for_conditional_generation/encoder/block_._9/layer_._0/SelfAttention/q/kernel:0', 'tf_t5for_conditional_generation/encoder/block_._9/layer_._0/SelfAttention/k/kernel:0', 'tf_t5for_conditional_generation/encoder/block_._9/layer_._0/SelfAttention/v/kernel:0', 'tf_t5for_conditional_generation/encoder/block_._9/layer_._0/SelfAttention/o/kernel:0', 'tf_t5for_conditional_generation/encoder/block_._9/layer_._0/layer_norm/weight:0', 'tf_t5for_conditional_generation/encoder/block_._9/layer_._1/DenseReluDense/wi/kernel:0', 'tf_t5for_conditional_generation/encoder/block_._9/layer_._1/DenseReluDense/wo/kernel:0', 'tf_t5for_conditional_generation/encoder/block_._9/layer_._1/layer_norm/weight:0', 'tf_t5for_conditional_generation/encoder/block_._10/layer_._0/SelfAttention/q/kernel:0', 'tf_t5for_conditional_generation/encoder/block_._10/layer_._0/SelfAttention/k/kernel:0', 'tf_t5for_conditional_generation/encoder/block_._10/layer_._0/SelfAttention/v/kernel:0', 'tf_t5for_conditional_generation/encoder/block_._10/layer_._0/SelfAttention/o/kernel:0', 'tf_t5for_conditional_generation/encoder/block_._10/layer_._0/layer_norm/weight:0', 'tf_t5for_conditional_generation/encoder/block_._10/layer_._1/DenseReluDense/wi/kernel:0', 'tf_t5for_conditional_generation/encoder/block_._10/layer_._1/DenseReluDense/wo/kernel:0', 'tf_t5for_conditional_generation/encoder/block_._10/layer_._1/layer_norm/weight:0', 'tf_t5for_conditional_generation/encoder/block_._11/layer_._0/SelfAttention/q/kernel:0', 'tf_t5for_conditional_generation/encoder/block_._11/layer_._0/SelfAttention/k/kernel:0', 'tf_t5for_conditional_generation/encoder/block_._11/layer_._0/SelfAttention/v/kernel:0', 'tf_t5for_conditional_generation/encoder/block_._11/layer_._0/SelfAttention/o/kernel:0', 'tf_t5for_conditional_generation/encoder/block_._11/layer_._0/layer_norm/weight:0', 'tf_t5for_conditional_generation/encoder/block_._11/layer_._1/DenseReluDense/wi/kernel:0', 'tf_t5for_conditional_generation/encoder/block_._11/layer_._1/DenseReluDense/wo/kernel:0', 'tf_t5for_conditional_generation/encoder/block_._11/layer_._1/layer_norm/weight:0', 'tf_t5for_conditional_generation/encoder/final_layer_norm/weight:0', 'tf_t5for_conditional_generation/decoder/block_._0/layer_._0/SelfAttention/q/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._0/layer_._0/SelfAttention/k/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._0/layer_._0/SelfAttention/v/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._0/layer_._0/SelfAttention/o/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._0/layer_._0/SelfAttention/relative_attention_bias/embeddings:0', 'tf_t5for_conditional_generation/decoder/block_._0/layer_._0/layer_norm/weight:0', 'tf_t5for_conditional_generation/decoder/block_._0/layer_._1/EncDecAttention/q/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._0/layer_._1/EncDecAttention/k/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._0/layer_._1/EncDecAttention/v/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._0/layer_._1/EncDecAttention/o/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._0/layer_._1/EncDecAttention/relative_attention_bias/embeddings:0', 'tf_t5for_conditional_generation/decoder/block_._0/layer_._1/layer_norm/weight:0', 'tf_t5for_conditional_generation/decoder/block_._0/layer_._2/DenseReluDense/wi/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._0/layer_._2/DenseReluDense/wo/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._0/layer_._2/layer_norm/weight:0', 'tf_t5for_conditional_generation/decoder/block_._1/layer_._0/SelfAttention/q/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._1/layer_._0/SelfAttention/k/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._1/layer_._0/SelfAttention/v/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._1/layer_._0/SelfAttention/o/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._1/layer_._0/layer_norm/weight:0', 'tf_t5for_conditional_generation/decoder/block_._1/layer_._1/EncDecAttention/q/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._1/layer_._1/EncDecAttention/k/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._1/layer_._1/EncDecAttention/v/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._1/layer_._1/EncDecAttention/o/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._1/layer_._1/layer_norm/weight:0', 'tf_t5for_conditional_generation/decoder/block_._1/layer_._2/DenseReluDense/wi/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._1/layer_._2/DenseReluDense/wo/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._1/layer_._2/layer_norm/weight:0', 'tf_t5for_conditional_generation/decoder/block_._2/layer_._0/SelfAttention/q/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._2/layer_._0/SelfAttention/k/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._2/layer_._0/SelfAttention/v/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._2/layer_._0/SelfAttention/o/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._2/layer_._0/layer_norm/weight:0', 'tf_t5for_conditional_generation/decoder/block_._2/layer_._1/EncDecAttention/q/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._2/layer_._1/EncDecAttention/k/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._2/layer_._1/EncDecAttention/v/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._2/layer_._1/EncDecAttention/o/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._2/layer_._1/layer_norm/weight:0', 'tf_t5for_conditional_generation/decoder/block_._2/layer_._2/DenseReluDense/wi/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._2/layer_._2/DenseReluDense/wo/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._2/layer_._2/layer_norm/weight:0', 'tf_t5for_conditional_generation/decoder/block_._3/layer_._0/SelfAttention/q/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._3/layer_._0/SelfAttention/k/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._3/layer_._0/SelfAttention/v/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._3/layer_._0/SelfAttention/o/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._3/layer_._0/layer_norm/weight:0', 'tf_t5for_conditional_generation/decoder/block_._3/layer_._1/EncDecAttention/q/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._3/layer_._1/EncDecAttention/k/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._3/layer_._1/EncDecAttention/v/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._3/layer_._1/EncDecAttention/o/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._3/layer_._1/layer_norm/weight:0', 'tf_t5for_conditional_generation/decoder/block_._3/layer_._2/DenseReluDense/wi/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._3/layer_._2/DenseReluDense/wo/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._3/layer_._2/layer_norm/weight:0', 'tf_t5for_conditional_generation/decoder/block_._4/layer_._0/SelfAttention/q/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._4/layer_._0/SelfAttention/k/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._4/layer_._0/SelfAttention/v/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._4/layer_._0/SelfAttention/o/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._4/layer_._0/layer_norm/weight:0', 'tf_t5for_conditional_generation/decoder/block_._4/layer_._1/EncDecAttention/q/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._4/layer_._1/EncDecAttention/k/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._4/layer_._1/EncDecAttention/v/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._4/layer_._1/EncDecAttention/o/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._4/layer_._1/layer_norm/weight:0', 'tf_t5for_conditional_generation/decoder/block_._4/layer_._2/DenseReluDense/wi/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._4/layer_._2/DenseReluDense/wo/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._4/layer_._2/layer_norm/weight:0', 'tf_t5for_conditional_generation/decoder/block_._5/layer_._0/SelfAttention/q/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._5/layer_._0/SelfAttention/k/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._5/layer_._0/SelfAttention/v/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._5/layer_._0/SelfAttention/o/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._5/layer_._0/layer_norm/weight:0', 'tf_t5for_conditional_generation/decoder/block_._5/layer_._1/EncDecAttention/q/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._5/layer_._1/EncDecAttention/k/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._5/layer_._1/EncDecAttention/v/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._5/layer_._1/EncDecAttention/o/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._5/layer_._1/layer_norm/weight:0', 'tf_t5for_conditional_generation/decoder/block_._5/layer_._2/DenseReluDense/wi/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._5/layer_._2/DenseReluDense/wo/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._5/layer_._2/layer_norm/weight:0', 'tf_t5for_conditional_generation/decoder/block_._6/layer_._0/SelfAttention/q/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._6/layer_._0/SelfAttention/k/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._6/layer_._0/SelfAttention/v/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._6/layer_._0/SelfAttention/o/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._6/layer_._0/layer_norm/weight:0', 'tf_t5for_conditional_generation/decoder/block_._6/layer_._1/EncDecAttention/q/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._6/layer_._1/EncDecAttention/k/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._6/layer_._1/EncDecAttention/v/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._6/layer_._1/EncDecAttention/o/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._6/layer_._1/layer_norm/weight:0', 'tf_t5for_conditional_generation/decoder/block_._6/layer_._2/DenseReluDense/wi/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._6/layer_._2/DenseReluDense/wo/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._6/layer_._2/layer_norm/weight:0', 'tf_t5for_conditional_generation/decoder/block_._7/layer_._0/SelfAttention/q/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._7/layer_._0/SelfAttention/k/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._7/layer_._0/SelfAttention/v/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._7/layer_._0/SelfAttention/o/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._7/layer_._0/layer_norm/weight:0', 'tf_t5for_conditional_generation/decoder/block_._7/layer_._1/EncDecAttention/q/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._7/layer_._1/EncDecAttention/k/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._7/layer_._1/EncDecAttention/v/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._7/layer_._1/EncDecAttention/o/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._7/layer_._1/layer_norm/weight:0', 'tf_t5for_conditional_generation/decoder/block_._7/layer_._2/DenseReluDense/wi/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._7/layer_._2/DenseReluDense/wo/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._7/layer_._2/layer_norm/weight:0', 'tf_t5for_conditional_generation/decoder/block_._8/layer_._0/SelfAttention/q/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._8/layer_._0/SelfAttention/k/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._8/layer_._0/SelfAttention/v/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._8/layer_._0/SelfAttention/o/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._8/layer_._0/layer_norm/weight:0', 'tf_t5for_conditional_generation/decoder/block_._8/layer_._1/EncDecAttention/q/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._8/layer_._1/EncDecAttention/k/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._8/layer_._1/EncDecAttention/v/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._8/layer_._1/EncDecAttention/o/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._8/layer_._1/layer_norm/weight:0', 'tf_t5for_conditional_generation/decoder/block_._8/layer_._2/DenseReluDense/wi/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._8/layer_._2/DenseReluDense/wo/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._8/layer_._2/layer_norm/weight:0', 'tf_t5for_conditional_generation/decoder/block_._9/layer_._0/SelfAttention/q/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._9/layer_._0/SelfAttention/k/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._9/layer_._0/SelfAttention/v/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._9/layer_._0/SelfAttention/o/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._9/layer_._0/layer_norm/weight:0', 'tf_t5for_conditional_generation/decoder/block_._9/layer_._1/EncDecAttention/q/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._9/layer_._1/EncDecAttention/k/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._9/layer_._1/EncDecAttention/v/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._9/layer_._1/EncDecAttention/o/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._9/layer_._1/layer_norm/weight:0', 'tf_t5for_conditional_generation/decoder/block_._9/layer_._2/DenseReluDense/wi/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._9/layer_._2/DenseReluDense/wo/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._9/layer_._2/layer_norm/weight:0', 'tf_t5for_conditional_generation/decoder/block_._10/layer_._0/SelfAttention/q/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._10/layer_._0/SelfAttention/k/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._10/layer_._0/SelfAttention/v/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._10/layer_._0/SelfAttention/o/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._10/layer_._0/layer_norm/weight:0', 'tf_t5for_conditional_generation/decoder/block_._10/layer_._1/EncDecAttention/q/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._10/layer_._1/EncDecAttention/k/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._10/layer_._1/EncDecAttention/v/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._10/layer_._1/EncDecAttention/o/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._10/layer_._1/layer_norm/weight:0', 'tf_t5for_conditional_generation/decoder/block_._10/layer_._2/DenseReluDense/wi/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._10/layer_._2/DenseReluDense/wo/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._10/layer_._2/layer_norm/weight:0', 'tf_t5for_conditional_generation/decoder/block_._11/layer_._0/SelfAttention/q/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._11/layer_._0/SelfAttention/k/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._11/layer_._0/SelfAttention/v/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._11/layer_._0/SelfAttention/o/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._11/layer_._0/layer_norm/weight:0', 'tf_t5for_conditional_generation/decoder/block_._11/layer_._1/EncDecAttention/q/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._11/layer_._1/EncDecAttention/k/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._11/layer_._1/EncDecAttention/v/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._11/layer_._1/EncDecAttention/o/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._11/layer_._1/layer_norm/weight:0', 'tf_t5for_conditional_generation/decoder/block_._11/layer_._2/DenseReluDense/wi/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._11/layer_._2/DenseReluDense/wo/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._11/layer_._2/layer_norm/weight:0', 'tf_t5for_conditional_generation/decoder/final_layer_norm/weight:0'].
Expected behavior
Should be able to run the training loop for the specified epochs.
Issue Analytics
- State:
- Created 3 years ago
- Reactions:1
- Comments:7 (6 by maintainers)
Top Results From Across the Web
Fine-tuning T5 on Tensorflow - Hugging Face Forums
In this project, Lewis propose to use T5 and the JFleg datasets. ... Second blocking issue: when I call the fit method on...
Read more >Transfer learning and fine-tuning | TensorFlow Core
First, we will go over the Keras trainable API in detail, which underlies most transfer learning & fine-tuning workflows.
Read more >Fine-tuning with TensorFlow - YouTube
This is the olversion of the Fine-Tuning with TensorFlow video, ... Let's fine-tune a Transformers models in TensorFlow, using Keras.
Read more >Early Stopping in HuggingFace - Examples – Weights & Biases
If you are using TensorFlow(Keras) to fine-tune a HuggingFace Transformer, adding early stopping is very straightforward with tf.keras.callbacks ...
Read more >pytriplet - PyPI
Fine-tuning T5 -like transformers using customize training loop, written in tensorflow2.0. Supported tasks include single sequence-based ...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Device placement strategy works and the error is no longer there. i should point out this is not the usual way to train a model in TF. We normally do not need to place the model explicitly on a device while creating a model.
You should create your model into a strategy.