Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Tensorflow 2 Finetuning TF T5 using keras fit

See original GitHub issue

The Problem

I have been trying to finetune the T5 model model using tensorflow and keras. there is no documentation/ offcial+community or notebook for finetuning T5 in tensorflow. There are a bunch of lines here and some finetuning instructions here other than that there is nothing for tensorflow.

Environment info

transformers version: 3.0.2
Platform: Linux-4.15.0-112-generic-x86_64-with-debian-buster-sid
Python version: Linux-4.15.0-112-generic-x86_64-with-debian-buster-sid
PyTorch version (GPU?): 1.6.0 (True)
Tensorflow version (GPU?): 2.2.0 (True)
Using GPU in script?: yes
Using distributed or parallel set-up in script?: no

Who can help

@patrickvonplaten @jplu

Information

Model I am using (Bert, XLNet …): TFT5ForConditionalGeneration (TFAutoModelWithLMHead) pretrained

The problem arises when using:

the official example scripts: (give details below)
my own modified scripts: (give details below)

The tasks I am working on is:

an official GLUE/SQUaD task: (SQuad from tfds)
my own task or dataset: (give details below)

To reproduce

model = TFAutoModelWithLMHead.from_pretrained("t5-base")
tokenizer = AutoTokenizer.from_pretrained("t5-base")

train_dataset, info = tfds.load('squad', split='train', with_info=True)

def encode_tf(inputs):
    """Encodes the squad inputs and uses the tokenizer to encode inputs returns 
        the appropriate model 'input_ids', 'attention masks`, `decoder_attention_mask`, 'labels'
   Returns:
       dict: returns a dictionary with keys:  'input_ids', 'attention masks`, `decoder_attention_mask`,
        'labels' with appropriate tensor values
    """
    pass

dataset = train_dataset.map(encode_tf)
dataset = dataset.shuffle(1000)
dataset = dataset.batch(8)

Sample data output:

data = next(iter(dataset))
data

{'input_ids': <tf.Tensor: shape=(8, 200), dtype=int32, numpy=
  array([[  987,   834,  7771, ...,     0,     0,     0],
         [  987,   834,  7771, ...,  2749,  3385, 12187],
         [  987,   834,  7771, ...,     0,     0,     0],
         ...,
         [  987,   834,  7771, ...,     0,     0,     0],
         [  987,   834,  7771, ...,     0,     0,     0],
         [  987,   834,  7771, ...,     6,    30,     8]], dtype=int32)>,
  'labels': <tf.Tensor: shape=(8, 200), dtype=int32, numpy=
  array([[ 363,   19,   80, ...,    0,    0,    0],
         [4504,  149,  186, ...,    0,    0,    0],
         [ 571,   54, 3298, ...,    0,    0,    0],
         ...,
         [2645, 2832, 4599, ...,    0,    0,    0],
         [ 571,  103, 7000, ...,    0,    0,    0],
         [ 366,  410,    8, ...,    0,    0,    0]], dtype=int32)>,
  'attention_mask': <tf.Tensor: shape=(8, 200), dtype=int32, numpy=
  array([[1, 1, 1, ..., 0, 0, 0],
         [1, 1, 1, ..., 1, 1, 1],
         [1, 1, 1, ..., 0, 0, 0],
         ...,
         [1, 1, 1, ..., 0, 0, 0],
         [1, 1, 1, ..., 0, 0, 0],
         [1, 1, 1, ..., 1, 1, 1]], dtype=int32)>,
  'decoder_attention_mask': <tf.Tensor: shape=(8, 200), dtype=int32, numpy=
  array([[1, 1, 1, ..., 0, 0, 0],
         [1, 1, 1, ..., 0, 0, 0],
         [1, 1, 1, ..., 0, 0, 0],
         ...,
         [1, 1, 1, ..., 0, 0, 0],
         [1, 1, 1, ..., 0, 0, 0],
         [1, 1, 1, ..., 0, 0, 0]], dtype=int32)>}

Training

optimizer = tf.keras.optimizers.Adam(learning_rate=3e-5)
loss = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
model.compile(optimizer=optimizer, loss=loss)

model.fit(dataset, epochs=10)

model.fit result in the following error about ValueError: No gradients provided for any variable

The Stacktrace

Epoch 1/10
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-163-f8c5e0c71664> in <module>
----> 1 model.fit(dataset, epochs=10)

~/anaconda3/envs/hugging/lib/python3.7/site-packages/tensorflow/python/keras/engine/training.py in _method_wrapper(self, *args, **kwargs)
     64   def _method_wrapper(self, *args, **kwargs):
     65     if not self._in_multi_worker_mode():  # pylint: disable=protected-access
---> 66       return method(self, *args, **kwargs)
     67 
     68     # Running inside `run_distribute_coordinator` already.

~/anaconda3/envs/hugging/lib/python3.7/site-packages/tensorflow/python/keras/engine/training.py in fit(self, x, y, batch_size, epochs, verbose, callbacks, validation_split, validation_data, shuffle, class_weight, sample_weight, initial_epoch, steps_per_epoch, validation_steps, validation_batch_size, validation_freq, max_queue_size, workers, use_multiprocessing)
    846                 batch_size=batch_size):
    847               callbacks.on_train_batch_begin(step)
--> 848               tmp_logs = train_function(iterator)
    849               # Catch OutOfRangeError for Datasets of unknown size.
    850               # This blocks until the batch has finished executing.

~/anaconda3/envs/hugging/lib/python3.7/site-packages/tensorflow/python/eager/def_function.py in __call__(self, *args, **kwds)
    578         xla_context.Exit()
    579     else:
--> 580       result = self._call(*args, **kwds)
    581 
    582     if tracing_count == self._get_tracing_count():

~/anaconda3/envs/hugging/lib/python3.7/site-packages/tensorflow/python/eager/def_function.py in _call(self, *args, **kwds)
    625       # This is the first call of __call__, so we have to initialize.
    626       initializers = []
--> 627       self._initialize(args, kwds, add_initializers_to=initializers)
    628     finally:
    629       # At this point we know that the initialization is complete (or less

~/anaconda3/envs/hugging/lib/python3.7/site-packages/tensorflow/python/eager/def_function.py in _initialize(self, args, kwds, add_initializers_to)
    504     self._concrete_stateful_fn = (
    505         self._stateful_fn._get_concrete_function_internal_garbage_collected(  # pylint: disable=protected-access
--> 506             *args, **kwds))
    507 
    508     def invalid_creator_scope(*unused_args, **unused_kwds):

~/anaconda3/envs/hugging/lib/python3.7/site-packages/tensorflow/python/eager/function.py in _get_concrete_function_internal_garbage_collected(self, *args, **kwargs)
   2444       args, kwargs = None, None
   2445     with self._lock:
-> 2446       graph_function, _, _ = self._maybe_define_function(args, kwargs)
   2447     return graph_function
   2448 

~/anaconda3/envs/hugging/lib/python3.7/site-packages/tensorflow/python/eager/function.py in _maybe_define_function(self, args, kwargs)
   2775 
   2776       self._function_cache.missed.add(call_context_key)
-> 2777       graph_function = self._create_graph_function(args, kwargs)
   2778       self._function_cache.primary[cache_key] = graph_function
   2779       return graph_function, args, kwargs

~/anaconda3/envs/hugging/lib/python3.7/site-packages/tensorflow/python/eager/function.py in _create_graph_function(self, args, kwargs, override_flat_arg_shapes)
   2665             arg_names=arg_names,
   2666             override_flat_arg_shapes=override_flat_arg_shapes,
-> 2667             capture_by_value=self._capture_by_value),
   2668         self._function_attributes,
   2669         # Tell the ConcreteFunction to clean up its graph once it goes out of

~/anaconda3/envs/hugging/lib/python3.7/site-packages/tensorflow/python/framework/func_graph.py in func_graph_from_py_func(name, python_func, args, kwargs, signature, func_graph, autograph, autograph_options, add_control_dependencies, arg_names, op_return_value, collections, capture_by_value, override_flat_arg_shapes)
    979         _, original_func = tf_decorator.unwrap(python_func)
    980 
--> 981       func_outputs = python_func(*func_args, **func_kwargs)
    982 
    983       # invariant: `func_outputs` contains only Tensors, CompositeTensors,

~/anaconda3/envs/hugging/lib/python3.7/site-packages/tensorflow/python/eager/def_function.py in wrapped_fn(*args, **kwds)
    439         # __wrapped__ allows AutoGraph to swap in a converted function. We give
    440         # the function a weak reference to itself to avoid a reference cycle.
--> 441         return weak_wrapped_fn().__wrapped__(*args, **kwds)
    442     weak_wrapped_fn = weakref.ref(wrapped_fn)
    443 

~/anaconda3/envs/hugging/lib/python3.7/site-packages/tensorflow/python/framework/func_graph.py in wrapper(*args, **kwargs)
    966           except Exception as e:  # pylint:disable=broad-except
    967             if hasattr(e, "ag_error_metadata"):
--> 968               raise e.ag_error_metadata.to_exception(e)
    969             else:
    970               raise

ValueError: in user code:

    /home/ml/anaconda3/envs/hugging/lib/python3.7/site-packages/tensorflow/python/keras/engine/training.py:571 train_function  *
        outputs = self.distribute_strategy.run(
    /home/ml/anaconda3/envs/hugging/lib/python3.7/site-packages/tensorflow/python/distribute/distribute_lib.py:951 run  **
        return self._extended.call_for_each_replica(fn, args=args, kwargs=kwargs)
    /home/ml/anaconda3/envs/hugging/lib/python3.7/site-packages/tensorflow/python/distribute/distribute_lib.py:2290 call_for_each_replica
        return self._call_for_each_replica(fn, args, kwargs)
    /home/ml/anaconda3/envs/hugging/lib/python3.7/site-packages/tensorflow/python/distribute/distribute_lib.py:2649 _call_for_each_replica
        return fn(*args, **kwargs)
    /home/ml/anaconda3/envs/hugging/lib/python3.7/site-packages/tensorflow/python/keras/engine/training.py:541 train_step  **
        self.trainable_variables)
    /home/ml/anaconda3/envs/hugging/lib/python3.7/site-packages/tensorflow/python/keras/engine/training.py:1804 _minimize
        trainable_variables))
    /home/ml/anaconda3/envs/hugging/lib/python3.7/site-packages/tensorflow/python/keras/optimizer_v2/optimizer_v2.py:521 _aggregate_gradients
        filtered_grads_and_vars = _filter_grads(grads_and_vars)
    /home/ml/anaconda3/envs/hugging/lib/python3.7/site-packages/tensorflow/python/keras/optimizer_v2/optimizer_v2.py:1219 _filter_grads
        ([v.name for _, v in grads_and_vars],))

    ValueError: No gradients provided for any variable: ['shared/shared/weight:0', 'tf_t5for_conditional_generation/encoder/block_._0/layer_._0/SelfAttention/q/kernel:0', 'tf_t5for_conditional_generation/encoder/block_._0/layer_._0/SelfAttention/k/kernel:0', 'tf_t5for_conditional_generation/encoder/block_._0/layer_._0/SelfAttention/v/kernel:0', 'tf_t5for_conditional_generation/encoder/block_._0/layer_._0/SelfAttention/o/kernel:0', 'tf_t5for_conditional_generation/encoder/block_._0/layer_._0/SelfAttention/relative_attention_bias/embeddings:0', 'tf_t5for_conditional_generation/encoder/block_._0/layer_._0/layer_norm/weight:0', 'tf_t5for_conditional_generation/encoder/block_._0/layer_._1/DenseReluDense/wi/kernel:0', 'tf_t5for_conditional_generation/encoder/block_._0/layer_._1/DenseReluDense/wo/kernel:0', 'tf_t5for_conditional_generation/encoder/block_._0/layer_._1/layer_norm/weight:0', 'tf_t5for_conditional_generation/encoder/block_._1/layer_._0/SelfAttention/q/kernel:0', 'tf_t5for_conditional_generation/encoder/block_._1/layer_._0/SelfAttention/k/kernel:0', 'tf_t5for_conditional_generation/encoder/block_._1/layer_._0/SelfAttention/v/kernel:0', 'tf_t5for_conditional_generation/encoder/block_._1/layer_._0/SelfAttention/o/kernel:0', 'tf_t5for_conditional_generation/encoder/block_._1/layer_._0/layer_norm/weight:0', 'tf_t5for_conditional_generation/encoder/block_._1/layer_._1/DenseReluDense/wi/kernel:0', 'tf_t5for_conditional_generation/encoder/block_._1/layer_._1/DenseReluDense/wo/kernel:0', 'tf_t5for_conditional_generation/encoder/block_._1/layer_._1/layer_norm/weight:0', 'tf_t5for_conditional_generation/encoder/block_._2/layer_._0/SelfAttention/q/kernel:0', 'tf_t5for_conditional_generation/encoder/block_._2/layer_._0/SelfAttention/k/kernel:0', 'tf_t5for_conditional_generation/encoder/block_._2/layer_._0/SelfAttention/v/kernel:0', 'tf_t5for_conditional_generation/encoder/block_._2/layer_._0/SelfAttention/o/kernel:0', 'tf_t5for_conditional_generation/encoder/block_._2/layer_._0/layer_norm/weight:0', 'tf_t5for_conditional_generation/encoder/block_._2/layer_._1/DenseReluDense/wi/kernel:0', 'tf_t5for_conditional_generation/encoder/block_._2/layer_._1/DenseReluDense/wo/kernel:0', 'tf_t5for_conditional_generation/encoder/block_._2/layer_._1/layer_norm/weight:0', 'tf_t5for_conditional_generation/encoder/block_._3/layer_._0/SelfAttention/q/kernel:0', 'tf_t5for_conditional_generation/encoder/block_._3/layer_._0/SelfAttention/k/kernel:0', 'tf_t5for_conditional_generation/encoder/block_._3/layer_._0/SelfAttention/v/kernel:0', 'tf_t5for_conditional_generation/encoder/block_._3/layer_._0/SelfAttention/o/kernel:0', 'tf_t5for_conditional_generation/encoder/block_._3/layer_._0/layer_norm/weight:0', 'tf_t5for_conditional_generation/encoder/block_._3/layer_._1/DenseReluDense/wi/kernel:0', 'tf_t5for_conditional_generation/encoder/block_._3/layer_._1/DenseReluDense/wo/kernel:0', 'tf_t5for_conditional_generation/encoder/block_._3/layer_._1/layer_norm/weight:0', 'tf_t5for_conditional_generation/encoder/block_._4/layer_._0/SelfAttention/q/kernel:0', 'tf_t5for_conditional_generation/encoder/block_._4/layer_._0/SelfAttention/k/kernel:0', 'tf_t5for_conditional_generation/encoder/block_._4/layer_._0/SelfAttention/v/kernel:0', 'tf_t5for_conditional_generation/encoder/block_._4/layer_._0/SelfAttention/o/kernel:0', 'tf_t5for_conditional_generation/encoder/block_._4/layer_._0/layer_norm/weight:0', 'tf_t5for_conditional_generation/encoder/block_._4/layer_._1/DenseReluDense/wi/kernel:0', 'tf_t5for_conditional_generation/encoder/block_._4/layer_._1/DenseReluDense/wo/kernel:0', 'tf_t5for_conditional_generation/encoder/block_._4/layer_._1/layer_norm/weight:0', 'tf_t5for_conditional_generation/encoder/block_._5/layer_._0/SelfAttention/q/kernel:0', 'tf_t5for_conditional_generation/encoder/block_._5/layer_._0/SelfAttention/k/kernel:0', 'tf_t5for_conditional_generation/encoder/block_._5/layer_._0/SelfAttention/v/kernel:0', 'tf_t5for_conditional_generation/encoder/block_._5/layer_._0/SelfAttention/o/kernel:0', 'tf_t5for_conditional_generation/encoder/block_._5/layer_._0/layer_norm/weight:0', 'tf_t5for_conditional_generation/encoder/block_._5/layer_._1/DenseReluDense/wi/kernel:0', 'tf_t5for_conditional_generation/encoder/block_._5/layer_._1/DenseReluDense/wo/kernel:0', 'tf_t5for_conditional_generation/encoder/block_._5/layer_._1/layer_norm/weight:0', 'tf_t5for_conditional_generation/encoder/block_._6/layer_._0/SelfAttention/q/kernel:0', 'tf_t5for_conditional_generation/encoder/block_._6/layer_._0/SelfAttention/k/kernel:0', 'tf_t5for_conditional_generation/encoder/block_._6/layer_._0/SelfAttention/v/kernel:0', 'tf_t5for_conditional_generation/encoder/block_._6/layer_._0/SelfAttention/o/kernel:0', 'tf_t5for_conditional_generation/encoder/block_._6/layer_._0/layer_norm/weight:0', 'tf_t5for_conditional_generation/encoder/block_._6/layer_._1/DenseReluDense/wi/kernel:0', 'tf_t5for_conditional_generation/encoder/block_._6/layer_._1/DenseReluDense/wo/kernel:0', 'tf_t5for_conditional_generation/encoder/block_._6/layer_._1/layer_norm/weight:0', 'tf_t5for_conditional_generation/encoder/block_._7/layer_._0/SelfAttention/q/kernel:0', 'tf_t5for_conditional_generation/encoder/block_._7/layer_._0/SelfAttention/k/kernel:0', 'tf_t5for_conditional_generation/encoder/block_._7/layer_._0/SelfAttention/v/kernel:0', 'tf_t5for_conditional_generation/encoder/block_._7/layer_._0/SelfAttention/o/kernel:0', 'tf_t5for_conditional_generation/encoder/block_._7/layer_._0/layer_norm/weight:0', 'tf_t5for_conditional_generation/encoder/block_._7/layer_._1/DenseReluDense/wi/kernel:0', 'tf_t5for_conditional_generation/encoder/block_._7/layer_._1/DenseReluDense/wo/kernel:0', 'tf_t5for_conditional_generation/encoder/block_._7/layer_._1/layer_norm/weight:0', 'tf_t5for_conditional_generation/encoder/block_._8/layer_._0/SelfAttention/q/kernel:0', 'tf_t5for_conditional_generation/encoder/block_._8/layer_._0/SelfAttention/k/kernel:0', 'tf_t5for_conditional_generation/encoder/block_._8/layer_._0/SelfAttention/v/kernel:0', 'tf_t5for_conditional_generation/encoder/block_._8/layer_._0/SelfAttention/o/kernel:0', 'tf_t5for_conditional_generation/encoder/block_._8/layer_._0/layer_norm/weight:0', 'tf_t5for_conditional_generation/encoder/block_._8/layer_._1/DenseReluDense/wi/kernel:0', 'tf_t5for_conditional_generation/encoder/block_._8/layer_._1/DenseReluDense/wo/kernel:0', 'tf_t5for_conditional_generation/encoder/block_._8/layer_._1/layer_norm/weight:0', 'tf_t5for_conditional_generation/encoder/block_._9/layer_._0/SelfAttention/q/kernel:0', 'tf_t5for_conditional_generation/encoder/block_._9/layer_._0/SelfAttention/k/kernel:0', 'tf_t5for_conditional_generation/encoder/block_._9/layer_._0/SelfAttention/v/kernel:0', 'tf_t5for_conditional_generation/encoder/block_._9/layer_._0/SelfAttention/o/kernel:0', 'tf_t5for_conditional_generation/encoder/block_._9/layer_._0/layer_norm/weight:0', 'tf_t5for_conditional_generation/encoder/block_._9/layer_._1/DenseReluDense/wi/kernel:0', 'tf_t5for_conditional_generation/encoder/block_._9/layer_._1/DenseReluDense/wo/kernel:0', 'tf_t5for_conditional_generation/encoder/block_._9/layer_._1/layer_norm/weight:0', 'tf_t5for_conditional_generation/encoder/block_._10/layer_._0/SelfAttention/q/kernel:0', 'tf_t5for_conditional_generation/encoder/block_._10/layer_._0/SelfAttention/k/kernel:0', 'tf_t5for_conditional_generation/encoder/block_._10/layer_._0/SelfAttention/v/kernel:0', 'tf_t5for_conditional_generation/encoder/block_._10/layer_._0/SelfAttention/o/kernel:0', 'tf_t5for_conditional_generation/encoder/block_._10/layer_._0/layer_norm/weight:0', 'tf_t5for_conditional_generation/encoder/block_._10/layer_._1/DenseReluDense/wi/kernel:0', 'tf_t5for_conditional_generation/encoder/block_._10/layer_._1/DenseReluDense/wo/kernel:0', 'tf_t5for_conditional_generation/encoder/block_._10/layer_._1/layer_norm/weight:0', 'tf_t5for_conditional_generation/encoder/block_._11/layer_._0/SelfAttention/q/kernel:0', 'tf_t5for_conditional_generation/encoder/block_._11/layer_._0/SelfAttention/k/kernel:0', 'tf_t5for_conditional_generation/encoder/block_._11/layer_._0/SelfAttention/v/kernel:0', 'tf_t5for_conditional_generation/encoder/block_._11/layer_._0/SelfAttention/o/kernel:0', 'tf_t5for_conditional_generation/encoder/block_._11/layer_._0/layer_norm/weight:0', 'tf_t5for_conditional_generation/encoder/block_._11/layer_._1/DenseReluDense/wi/kernel:0', 'tf_t5for_conditional_generation/encoder/block_._11/layer_._1/DenseReluDense/wo/kernel:0', 'tf_t5for_conditional_generation/encoder/block_._11/layer_._1/layer_norm/weight:0', 'tf_t5for_conditional_generation/encoder/final_layer_norm/weight:0', 'tf_t5for_conditional_generation/decoder/block_._0/layer_._0/SelfAttention/q/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._0/layer_._0/SelfAttention/k/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._0/layer_._0/SelfAttention/v/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._0/layer_._0/SelfAttention/o/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._0/layer_._0/SelfAttention/relative_attention_bias/embeddings:0', 'tf_t5for_conditional_generation/decoder/block_._0/layer_._0/layer_norm/weight:0', 'tf_t5for_conditional_generation/decoder/block_._0/layer_._1/EncDecAttention/q/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._0/layer_._1/EncDecAttention/k/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._0/layer_._1/EncDecAttention/v/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._0/layer_._1/EncDecAttention/o/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._0/layer_._1/EncDecAttention/relative_attention_bias/embeddings:0', 'tf_t5for_conditional_generation/decoder/block_._0/layer_._1/layer_norm/weight:0', 'tf_t5for_conditional_generation/decoder/block_._0/layer_._2/DenseReluDense/wi/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._0/layer_._2/DenseReluDense/wo/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._0/layer_._2/layer_norm/weight:0', 'tf_t5for_conditional_generation/decoder/block_._1/layer_._0/SelfAttention/q/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._1/layer_._0/SelfAttention/k/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._1/layer_._0/SelfAttention/v/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._1/layer_._0/SelfAttention/o/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._1/layer_._0/layer_norm/weight:0', 'tf_t5for_conditional_generation/decoder/block_._1/layer_._1/EncDecAttention/q/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._1/layer_._1/EncDecAttention/k/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._1/layer_._1/EncDecAttention/v/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._1/layer_._1/EncDecAttention/o/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._1/layer_._1/layer_norm/weight:0', 'tf_t5for_conditional_generation/decoder/block_._1/layer_._2/DenseReluDense/wi/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._1/layer_._2/DenseReluDense/wo/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._1/layer_._2/layer_norm/weight:0', 'tf_t5for_conditional_generation/decoder/block_._2/layer_._0/SelfAttention/q/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._2/layer_._0/SelfAttention/k/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._2/layer_._0/SelfAttention/v/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._2/layer_._0/SelfAttention/o/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._2/layer_._0/layer_norm/weight:0', 'tf_t5for_conditional_generation/decoder/block_._2/layer_._1/EncDecAttention/q/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._2/layer_._1/EncDecAttention/k/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._2/layer_._1/EncDecAttention/v/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._2/layer_._1/EncDecAttention/o/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._2/layer_._1/layer_norm/weight:0', 'tf_t5for_conditional_generation/decoder/block_._2/layer_._2/DenseReluDense/wi/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._2/layer_._2/DenseReluDense/wo/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._2/layer_._2/layer_norm/weight:0', 'tf_t5for_conditional_generation/decoder/block_._3/layer_._0/SelfAttention/q/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._3/layer_._0/SelfAttention/k/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._3/layer_._0/SelfAttention/v/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._3/layer_._0/SelfAttention/o/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._3/layer_._0/layer_norm/weight:0', 'tf_t5for_conditional_generation/decoder/block_._3/layer_._1/EncDecAttention/q/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._3/layer_._1/EncDecAttention/k/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._3/layer_._1/EncDecAttention/v/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._3/layer_._1/EncDecAttention/o/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._3/layer_._1/layer_norm/weight:0', 'tf_t5for_conditional_generation/decoder/block_._3/layer_._2/DenseReluDense/wi/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._3/layer_._2/DenseReluDense/wo/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._3/layer_._2/layer_norm/weight:0', 'tf_t5for_conditional_generation/decoder/block_._4/layer_._0/SelfAttention/q/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._4/layer_._0/SelfAttention/k/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._4/layer_._0/SelfAttention/v/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._4/layer_._0/SelfAttention/o/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._4/layer_._0/layer_norm/weight:0', 'tf_t5for_conditional_generation/decoder/block_._4/layer_._1/EncDecAttention/q/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._4/layer_._1/EncDecAttention/k/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._4/layer_._1/EncDecAttention/v/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._4/layer_._1/EncDecAttention/o/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._4/layer_._1/layer_norm/weight:0', 'tf_t5for_conditional_generation/decoder/block_._4/layer_._2/DenseReluDense/wi/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._4/layer_._2/DenseReluDense/wo/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._4/layer_._2/layer_norm/weight:0', 'tf_t5for_conditional_generation/decoder/block_._5/layer_._0/SelfAttention/q/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._5/layer_._0/SelfAttention/k/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._5/layer_._0/SelfAttention/v/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._5/layer_._0/SelfAttention/o/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._5/layer_._0/layer_norm/weight:0', 'tf_t5for_conditional_generation/decoder/block_._5/layer_._1/EncDecAttention/q/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._5/layer_._1/EncDecAttention/k/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._5/layer_._1/EncDecAttention/v/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._5/layer_._1/EncDecAttention/o/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._5/layer_._1/layer_norm/weight:0', 'tf_t5for_conditional_generation/decoder/block_._5/layer_._2/DenseReluDense/wi/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._5/layer_._2/DenseReluDense/wo/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._5/layer_._2/layer_norm/weight:0', 'tf_t5for_conditional_generation/decoder/block_._6/layer_._0/SelfAttention/q/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._6/layer_._0/SelfAttention/k/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._6/layer_._0/SelfAttention/v/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._6/layer_._0/SelfAttention/o/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._6/layer_._0/layer_norm/weight:0', 'tf_t5for_conditional_generation/decoder/block_._6/layer_._1/EncDecAttention/q/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._6/layer_._1/EncDecAttention/k/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._6/layer_._1/EncDecAttention/v/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._6/layer_._1/EncDecAttention/o/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._6/layer_._1/layer_norm/weight:0', 'tf_t5for_conditional_generation/decoder/block_._6/layer_._2/DenseReluDense/wi/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._6/layer_._2/DenseReluDense/wo/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._6/layer_._2/layer_norm/weight:0', 'tf_t5for_conditional_generation/decoder/block_._7/layer_._0/SelfAttention/q/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._7/layer_._0/SelfAttention/k/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._7/layer_._0/SelfAttention/v/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._7/layer_._0/SelfAttention/o/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._7/layer_._0/layer_norm/weight:0', 'tf_t5for_conditional_generation/decoder/block_._7/layer_._1/EncDecAttention/q/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._7/layer_._1/EncDecAttention/k/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._7/layer_._1/EncDecAttention/v/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._7/layer_._1/EncDecAttention/o/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._7/layer_._1/layer_norm/weight:0', 'tf_t5for_conditional_generation/decoder/block_._7/layer_._2/DenseReluDense/wi/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._7/layer_._2/DenseReluDense/wo/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._7/layer_._2/layer_norm/weight:0', 'tf_t5for_conditional_generation/decoder/block_._8/layer_._0/SelfAttention/q/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._8/layer_._0/SelfAttention/k/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._8/layer_._0/SelfAttention/v/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._8/layer_._0/SelfAttention/o/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._8/layer_._0/layer_norm/weight:0', 'tf_t5for_conditional_generation/decoder/block_._8/layer_._1/EncDecAttention/q/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._8/layer_._1/EncDecAttention/k/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._8/layer_._1/EncDecAttention/v/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._8/layer_._1/EncDecAttention/o/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._8/layer_._1/layer_norm/weight:0', 'tf_t5for_conditional_generation/decoder/block_._8/layer_._2/DenseReluDense/wi/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._8/layer_._2/DenseReluDense/wo/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._8/layer_._2/layer_norm/weight:0', 'tf_t5for_conditional_generation/decoder/block_._9/layer_._0/SelfAttention/q/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._9/layer_._0/SelfAttention/k/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._9/layer_._0/SelfAttention/v/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._9/layer_._0/SelfAttention/o/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._9/layer_._0/layer_norm/weight:0', 'tf_t5for_conditional_generation/decoder/block_._9/layer_._1/EncDecAttention/q/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._9/layer_._1/EncDecAttention/k/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._9/layer_._1/EncDecAttention/v/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._9/layer_._1/EncDecAttention/o/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._9/layer_._1/layer_norm/weight:0', 'tf_t5for_conditional_generation/decoder/block_._9/layer_._2/DenseReluDense/wi/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._9/layer_._2/DenseReluDense/wo/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._9/layer_._2/layer_norm/weight:0', 'tf_t5for_conditional_generation/decoder/block_._10/layer_._0/SelfAttention/q/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._10/layer_._0/SelfAttention/k/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._10/layer_._0/SelfAttention/v/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._10/layer_._0/SelfAttention/o/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._10/layer_._0/layer_norm/weight:0', 'tf_t5for_conditional_generation/decoder/block_._10/layer_._1/EncDecAttention/q/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._10/layer_._1/EncDecAttention/k/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._10/layer_._1/EncDecAttention/v/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._10/layer_._1/EncDecAttention/o/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._10/layer_._1/layer_norm/weight:0', 'tf_t5for_conditional_generation/decoder/block_._10/layer_._2/DenseReluDense/wi/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._10/layer_._2/DenseReluDense/wo/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._10/layer_._2/layer_norm/weight:0', 'tf_t5for_conditional_generation/decoder/block_._11/layer_._0/SelfAttention/q/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._11/layer_._0/SelfAttention/k/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._11/layer_._0/SelfAttention/v/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._11/layer_._0/SelfAttention/o/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._11/layer_._0/layer_norm/weight:0', 'tf_t5for_conditional_generation/decoder/block_._11/layer_._1/EncDecAttention/q/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._11/layer_._1/EncDecAttention/k/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._11/layer_._1/EncDecAttention/v/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._11/layer_._1/EncDecAttention/o/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._11/layer_._1/layer_norm/weight:0', 'tf_t5for_conditional_generation/decoder/block_._11/layer_._2/DenseReluDense/wi/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._11/layer_._2/DenseReluDense/wo/kernel:0', 'tf_t5for_conditional_generation/decoder/block_._11/layer_._2/layer_norm/weight:0', 'tf_t5for_conditional_generation/decoder/final_layer_norm/weight:0'].

Expected behavior

Should be able to run the training loop for the specified epochs.

Issue Analytics

State:
Created 3 years ago
Reactions:1
Comments:7 (6 by maintainers)

Top GitHub Comments

1reaction

HarrisDePerceptroncommented, Aug 31, 2020

Device placement strategy works and the error is no longer there. i should point out this is not the usual way to train a model in TF. We normally do not need to place the model explicitly on a device while creating a model.

1reaction

jplucommented, Aug 31, 2020

You should create your model into a strategy.