question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

GPT2 for classification - Errors encountered while running run_glue.py and (possible) fixes

See original GitHub issue

Here is a description of series of errors I encountered while fine-tuning gpt2 pre-trained model using run_glue.py (which were also reported here). I am also mentioning here the code fixes I had to make to fix these errors. If the custodians of the code-base are happy with the changes, I will be glad to check the changes in if the set of instructions to submit the patch, get it reviewed and checkin are shared with me.

Environment info

  • transformers version: 4.10.0.dev0
  • Platform: Linux-5.4.0-1051-azure-x86_64-with-glibc2.10
  • Python version: 3.8.1
  • PyTorch version (GPU?): 1.9.0
  • Tensorflow version (GPU?): 2.3.0
  • Using GPU in script?: Yes (1 gpu)
  • Using distributed or parallel set-up in script?:

Who can help

@patrickvonplaten, @sgugger, @patil-suraj

Model I am using (Bert, XLNet …): GPT2

The problem arises when using:

  • the official example scripts: (give details below) examples/tensorflow/text-classification/run_glue.py

The tasks I am working on is:

  • an official GLUE/SQUaD task: GLUE

To reproduce

Steps to reproduce the behavior: (applicable to any GLUE classification task)

  1. python run_glue.py --model_name_or_path gpt2 --task_name sst2 --do_train --do_eval --do_predict --output_dir ./output

Error 1 File “run_glue.py”, line 567, in <module> main() File “run_glue.py”, line 415, in main optimizer = tf.keras.optimizers.Adam( File “/anaconda/envs/azureml_py38/lib/python3.8/site-packages/tensorflow/python/keras/optimizer_v2/adam.py”, line 115, in init super(Adam, self).init(name, **kwargs) File “/anaconda/envs/azureml_py38/lib/python3.8/site-packages/tensorflow/python/keras/optimizer_v2/optimizer_v2.py”, line 335, in init raise ValueError("Gradient clipping in the optimizer " ValueError: Gradient clipping in the optimizer (by setting clipnorm or clipvalue) is currently unsupported when using a distribution strategy.

Fix Don’t set the clipnorm parameter

clipnorm=training_args.max_grad_norm,

Error 2 ValueError: in user code: /anaconda/envs/azureml_py38/lib/python3.8/site-packages/tensorflow/python/keras/engine/training.py:806 train_function * return step_function(self, iterator) /anaconda/envs/azureml_py38/lib/python3.8/site-packages/tensorflow/python/keras/engine/training.py:796 step_function ** outputs = model.distribute_strategy.run(run_step, args=(data,)) /anaconda/envs/azureml_py38/lib/python3.8/site-packages/tensorflow/python/distribute/one_device_strategy.py:184 run return super(OneDeviceStrategy, self).run(fn, args, kwargs, options) /anaconda/envs/azureml_py38/lib/python3.8/site-packages/tensorflow/python/distribute/distribute_lib.py:1211 run return self._extended.call_for_each_replica(fn, args=args, kwargs=kwargs) /anaconda/envs/azureml_py38/lib/python3.8/site-packages/tensorflow/python/distribute/distribute_lib.py:2585 call_for_each_replica return self._call_for_each_replica(fn, args, kwargs) /anaconda/envs/azureml_py38/lib/python3.8/site-packages/tensorflow/python/distribute/one_device_strategy.py:367 _call_for_each_replica return fn(*args, **kwargs) /anaconda/envs/azureml_py38/lib/python3.8/site-packages/tensorflow/python/keras/engine/training.py:789 run_step ** outputs = model.train_step(data) /anaconda/envs/azureml_py38/lib/python3.8/site-packages/tensorflow/python/keras/engine/training.py:748 train_step loss = self.compiled_loss( /anaconda/envs/azureml_py38/lib/python3.8/site-packages/tensorflow/python/keras/engine/compile_utils.py:204 call loss_value = loss_obj(y_t, y_p, sample_weight=sw) /anaconda/envs/azureml_py38/lib/python3.8/site-packages/tensorflow/python/keras/losses.py:149 call losses = ag_call(y_true, y_pred) /anaconda/envs/azureml_py38/lib/python3.8/site-packages/tensorflow/python/keras/losses.py:253 call ** return ag_fn(y_true, y_pred, **self._fn_kwargs) /anaconda/envs/azureml_py38/lib/python3.8/site-packages/tensorflow/python/util/dispatch.py:201 wrapper return target(*args, **kwargs) /anaconda/envs/azureml_py38/lib/python3.8/site-packages/tensorflow/python/keras/losses.py:1566 sparse_categorical_crossentropy return K.sparse_categorical_crossentropy( /anaconda/envs/azureml_py38/lib/python3.8/site-packages/tensorflow/python/util/dispatch.py:201 wrapper return target(*args, **kwargs) /anaconda/envs/azureml_py38/lib/python3.8/site-packages/tensorflow/python/keras/backend.py:4790 sparse_categorical_crossentropy return array_ops.reshape(res, output_shape[:-1]) /anaconda/envs/azureml_py38/lib/python3.8/site-packages/tensorflow/python/util/dispatch.py:201 wrapper return target(*args, **kwargs) /anaconda/envs/azureml_py38/lib/python3.8/site-packages/tensorflow/python/ops/array_ops.py:195 reshape result = gen_array_ops.reshape(tensor, shape, name) /anaconda/envs/azureml_py38/lib/python3.8/site-packages/tensorflow/python/ops/gen_array_ops.py:8233 reshape _, _, _op, _outputs = _op_def_library._apply_op_helper( /anaconda/envs/azureml_py38/lib/python3.8/site-packages/tensorflow/python/framework/op_def_library.py:742 _apply_op_helper op = g._create_op_internal(op_type_name, inputs, dtypes=None, /anaconda/envs/azureml_py38/lib/python3.8/site-packages/tensorflow/python/framework/func_graph.py:591 _create_op_internal return super(FuncGraph, self)._create_op_internal( # pylint: disable=protected-access /anaconda/envs/azureml_py38/lib/python3.8/site-packages/tensorflow/python/framework/ops.py:3477 _create_op_internal ret = Operation( /anaconda/envs/azureml_py38/lib/python3.8/site-packages/tensorflow/python/framework/ops.py:1974 init self._c_op = _create_c_op(self._graph, node_def, inputs, /anaconda/envs/azureml_py38/lib/python3.8/site-packages/tensorflow/python/framework/ops.py:1815 _create_c_op raise ValueError(str(e)) ValueError: Dimension size must be evenly divisible by 192 but is 8 for ‘{{node sparse_categorical_crossentropy_2/Reshape_2}} = Reshape[T=DT_FLOAT, Tshape=DT_INT32](sparse_categorical_crossentropy_2/SparseSoftmaxCrossEntropyWithLogits/SparseSoftmaxCrossEntropyWithLogits, sparse_categorical_crossentropy_2/strided_slice_1)’ with input shapes: [8], [4] and with input tensors computed as partial shapes: input[1] = [2,8,12,?].

Fix It looks the like call to TFGPT2ForSequenceClassification return logits in shape (batch_size, sequence_length, num_labels), which is causing the above error.

After pooled_logits are computed, add the following line to extract the logits from last step of the sequence pooled_logits = pooled_logits[:, -1, :]

and change return TFSequenceClassifierOutputWithPast( loss=loss, logits=pooled_logits, past_key_values=transformer_outputs.past_key_values, hidden_states=transformer_outputs.hidden_states, attentions=transformer_outputs.attentions, )

to return TFSequenceClassifierOutputWithPast( logits=pooled_logits, )

Expected behavior

Successful completion of training and evaluation

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:11 (6 by maintainers)

github_iconTop GitHub Comments

1reaction
Rocketknight1commented, Aug 31, 2021

BART is a Seq2Seq model, and I’m not sure if we have a TF implementation of a sequence classifier head for it, unfortunately. You might have to build your own model, starting from TFBartModel and then adding a classifier head on top.

0reactions
github-actions[bot]commented, Oct 7, 2021

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Examples — transformers 2.9.1 documentation - Hugging Face
Fine-tuning the library TensorFlow 2.0 Bert model for sequence classification on the MRPC task of the GLUE benchmark: General Language Understanding Evaluation.
Read more >
GPT2 For Text Classification using Hugging Face Transformers
Complete tutorial on how to use GPT2 for text classification. ... to be easy to follow if you decide to run each code...
Read more >
Tutorial: Text Classification using GPT2 and Pytorch - YouTube
Text classification is a very common problem that needs solving when dealing with text data. We've all seen and know how to use...
Read more >
Optimizing T5 and GPT-2 for Real-Time Inference with NVIDIA ...
While larger neural language models generally yield better results, deploying them for production poses serious challenges, especially for ...
Read more >
Guide to fine-tuning Text Generation models: GPT-2, GPT-Neo ...
Text generation is an interesting task in NLP, where the intention is to generate text when provided with some prompt as input. Usually,...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found