GPT2 for classification - Errors encountered while running run_glue.py and (possible) fixes
See original GitHub issueHere is a description of series of errors I encountered while fine-tuning gpt2 pre-trained model using run_glue.py (which were also reported here). I am also mentioning here the code fixes I had to make to fix these errors. If the custodians of the code-base are happy with the changes, I will be glad to check the changes in if the set of instructions to submit the patch, get it reviewed and checkin are shared with me.
Environment info
transformers
version: 4.10.0.dev0- Platform: Linux-5.4.0-1051-azure-x86_64-with-glibc2.10
- Python version: 3.8.1
- PyTorch version (GPU?): 1.9.0
- Tensorflow version (GPU?): 2.3.0
- Using GPU in script?: Yes (1 gpu)
- Using distributed or parallel set-up in script?:
Who can help
@patrickvonplaten, @sgugger, @patil-suraj
Model I am using (Bert, XLNet …): GPT2
The problem arises when using:
- the official example scripts: (give details below) examples/tensorflow/text-classification/run_glue.py
The tasks I am working on is:
- an official GLUE/SQUaD task: GLUE
To reproduce
Steps to reproduce the behavior: (applicable to any GLUE classification task)
- python run_glue.py --model_name_or_path gpt2 --task_name sst2 --do_train --do_eval --do_predict --output_dir ./output
Error 1 File “run_glue.py”, line 567, in <module> main() File “run_glue.py”, line 415, in main optimizer = tf.keras.optimizers.Adam( File “/anaconda/envs/azureml_py38/lib/python3.8/site-packages/tensorflow/python/keras/optimizer_v2/adam.py”, line 115, in init super(Adam, self).init(name, **kwargs) File “/anaconda/envs/azureml_py38/lib/python3.8/site-packages/tensorflow/python/keras/optimizer_v2/optimizer_v2.py”, line 335, in init raise ValueError("Gradient clipping in the optimizer " ValueError: Gradient clipping in the optimizer (by setting clipnorm or clipvalue) is currently unsupported when using a distribution strategy.
Fix Don’t set the clipnorm parameter
clipnorm=training_args.max_grad_norm,
Error 2 ValueError: in user code: /anaconda/envs/azureml_py38/lib/python3.8/site-packages/tensorflow/python/keras/engine/training.py:806 train_function * return step_function(self, iterator) /anaconda/envs/azureml_py38/lib/python3.8/site-packages/tensorflow/python/keras/engine/training.py:796 step_function ** outputs = model.distribute_strategy.run(run_step, args=(data,)) /anaconda/envs/azureml_py38/lib/python3.8/site-packages/tensorflow/python/distribute/one_device_strategy.py:184 run return super(OneDeviceStrategy, self).run(fn, args, kwargs, options) /anaconda/envs/azureml_py38/lib/python3.8/site-packages/tensorflow/python/distribute/distribute_lib.py:1211 run return self._extended.call_for_each_replica(fn, args=args, kwargs=kwargs) /anaconda/envs/azureml_py38/lib/python3.8/site-packages/tensorflow/python/distribute/distribute_lib.py:2585 call_for_each_replica return self._call_for_each_replica(fn, args, kwargs) /anaconda/envs/azureml_py38/lib/python3.8/site-packages/tensorflow/python/distribute/one_device_strategy.py:367 _call_for_each_replica return fn(*args, **kwargs) /anaconda/envs/azureml_py38/lib/python3.8/site-packages/tensorflow/python/keras/engine/training.py:789 run_step ** outputs = model.train_step(data) /anaconda/envs/azureml_py38/lib/python3.8/site-packages/tensorflow/python/keras/engine/training.py:748 train_step loss = self.compiled_loss( /anaconda/envs/azureml_py38/lib/python3.8/site-packages/tensorflow/python/keras/engine/compile_utils.py:204 call loss_value = loss_obj(y_t, y_p, sample_weight=sw) /anaconda/envs/azureml_py38/lib/python3.8/site-packages/tensorflow/python/keras/losses.py:149 call losses = ag_call(y_true, y_pred) /anaconda/envs/azureml_py38/lib/python3.8/site-packages/tensorflow/python/keras/losses.py:253 call ** return ag_fn(y_true, y_pred, **self._fn_kwargs) /anaconda/envs/azureml_py38/lib/python3.8/site-packages/tensorflow/python/util/dispatch.py:201 wrapper return target(*args, **kwargs) /anaconda/envs/azureml_py38/lib/python3.8/site-packages/tensorflow/python/keras/losses.py:1566 sparse_categorical_crossentropy return K.sparse_categorical_crossentropy( /anaconda/envs/azureml_py38/lib/python3.8/site-packages/tensorflow/python/util/dispatch.py:201 wrapper return target(*args, **kwargs) /anaconda/envs/azureml_py38/lib/python3.8/site-packages/tensorflow/python/keras/backend.py:4790 sparse_categorical_crossentropy return array_ops.reshape(res, output_shape[:-1]) /anaconda/envs/azureml_py38/lib/python3.8/site-packages/tensorflow/python/util/dispatch.py:201 wrapper return target(*args, **kwargs) /anaconda/envs/azureml_py38/lib/python3.8/site-packages/tensorflow/python/ops/array_ops.py:195 reshape result = gen_array_ops.reshape(tensor, shape, name) /anaconda/envs/azureml_py38/lib/python3.8/site-packages/tensorflow/python/ops/gen_array_ops.py:8233 reshape _, _, _op, _outputs = _op_def_library._apply_op_helper( /anaconda/envs/azureml_py38/lib/python3.8/site-packages/tensorflow/python/framework/op_def_library.py:742 _apply_op_helper op = g._create_op_internal(op_type_name, inputs, dtypes=None, /anaconda/envs/azureml_py38/lib/python3.8/site-packages/tensorflow/python/framework/func_graph.py:591 _create_op_internal return super(FuncGraph, self)._create_op_internal( # pylint: disable=protected-access /anaconda/envs/azureml_py38/lib/python3.8/site-packages/tensorflow/python/framework/ops.py:3477 _create_op_internal ret = Operation( /anaconda/envs/azureml_py38/lib/python3.8/site-packages/tensorflow/python/framework/ops.py:1974 init self._c_op = _create_c_op(self._graph, node_def, inputs, /anaconda/envs/azureml_py38/lib/python3.8/site-packages/tensorflow/python/framework/ops.py:1815 _create_c_op raise ValueError(str(e)) ValueError: Dimension size must be evenly divisible by 192 but is 8 for ‘{{node sparse_categorical_crossentropy_2/Reshape_2}} = Reshape[T=DT_FLOAT, Tshape=DT_INT32](sparse_categorical_crossentropy_2/SparseSoftmaxCrossEntropyWithLogits/SparseSoftmaxCrossEntropyWithLogits, sparse_categorical_crossentropy_2/strided_slice_1)’ with input shapes: [8], [4] and with input tensors computed as partial shapes: input[1] = [2,8,12,?].
Fix It looks the like call to TFGPT2ForSequenceClassification return logits in shape (batch_size, sequence_length, num_labels), which is causing the above error.
After pooled_logits are computed, add the following line to extract the logits from last step of the sequence pooled_logits = pooled_logits[:, -1, :]
and change return TFSequenceClassifierOutputWithPast( loss=loss, logits=pooled_logits, past_key_values=transformer_outputs.past_key_values, hidden_states=transformer_outputs.hidden_states, attentions=transformer_outputs.attentions, )
to return TFSequenceClassifierOutputWithPast( logits=pooled_logits, )
Expected behavior
Successful completion of training and evaluation
Issue Analytics
- State:
- Created 2 years ago
- Comments:11 (6 by maintainers)
BART is a Seq2Seq model, and I’m not sure if we have a TF implementation of a sequence classifier head for it, unfortunately. You might have to build your own model, starting from TFBartModel and then adding a classifier head on top.
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.