question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Issues with TFGPT2ForSequenceClassification

See original GitHub issue

Environment info

  • transformers version: 4.5.1
  • Platform: Google Colab
  • Python version:
  • PyTorch version (GPU?):
  • Tensorflow version (GPU?): 2.4.1
  • Using GPU in script?: NO, but tf automatically use it
  • Using distributed or parallel set-up in script?:

Who can help

@patrickvonplaten, @LysandreJik, @Rocketknight1

Information

Model I am using (GPT2):

The problem arises when using:

  • my own modified scripts: (give details below) When using TFGPT2ForSequenceClassification, I found that the structure of the model is weird, see below: image Why is the classifier inserted before the GPT main layer? And when I load the PyTorch version, it looks different (inserted after the main layer): image Also, I tried to train this model as the tutorials of fine-tuning on Bert with customized dataset suggests, but failed as following, I loaded the pretrained classification model with 3 classes: ValueError: in user code:

    /usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/engine/training.py:805 train_function * return step_function(self, iterator) /usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/engine/training.py:795 step_function ** outputs = model.distribute_strategy.run(run_step, args=(data,)) /usr/local/lib/python3.7/dist-packages/tensorflow/python/distribute/distribute_lib.py:1259 run return self._extended.call_for_each_replica(fn, args=args, kwargs=kwargs) /usr/local/lib/python3.7/dist-packages/tensorflow/python/distribute/distribute_lib.py:2730 call_for_each_replica return self._call_for_each_replica(fn, args, kwargs) /usr/local/lib/python3.7/dist-packages/tensorflow/python/distribute/distribute_lib.py:3417 _call_for_each_replica return fn(*args, **kwargs) /usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/engine/training.py:788 run_step ** outputs = model.train_step(data) /usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/engine/training.py:758 train_step self.compiled_metrics.update_state(y, y_pred, sample_weight) /usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/engine/compile_utils.py:408 update_state metric_obj.update_state(y_t, y_p, sample_weight=mask) /usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/utils/metrics_utils.py:90 decorated update_op = update_state_fn(*args, **kwargs) /usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/metrics.py:177 update_state_fn return ag_update_state(*args, **kwargs) /usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/metrics.py:618 update_state ** matches = ag_fn(y_true, y_pred, **self._fn_kwargs) /usr/local/lib/python3.7/dist-packages/tensorflow/python/util/dispatch.py:201 wrapper return target(*args, **kwargs) /usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/metrics.py:3315 sparse_categorical_accuracy return math_ops.cast(math_ops.equal(y_true, y_pred), K.floatx()) /usr/local/lib/python3.7/dist-packages/tensorflow/python/util/dispatch.py:201 wrapper return target(*args, **kwargs) /usr/local/lib/python3.7/dist-packages/tensorflow/python/ops/math_ops.py:1679 equal return gen_math_ops.equal(x, y, name=name) /usr/local/lib/python3.7/dist-packages/tensorflow/python/ops/gen_math_ops.py:3179 equal name=name) /usr/local/lib/python3.7/dist-packages/tensorflow/python/framework/op_def_library.py:750 _apply_op_helper attrs=attr_protos, op_def=op_def) /usr/local/lib/python3.7/dist-packages/tensorflow/python/framework/func_graph.py:592 _create_op_internal compute_device) /usr/local/lib/python3.7/dist-packages/tensorflow/python/framework/ops.py:3536 _create_op_internal op_def=op_def) /usr/local/lib/python3.7/dist-packages/tensorflow/python/framework/ops.py:2016 init control_input_ops, op_def) /usr/local/lib/python3.7/dist-packages/tensorflow/python/framework/ops.py:1856 _create_c_op raise ValueError(str(e))

    ValueError: Dimensions must be equal, but are 3 and 512 for ‘{{node Equal}} = Equal[T=DT_FLOAT, incompatible_shape_error=true](Cast_1, Cast_2)’ with input shapes: [?,3], [?,512].

The tasks I am working on is:

  • my own task or dataset: (give details below) The task is a multi-label classification task, where the label of each sample could be represented as a 3-dim vector like [0,0,0], [0,1,0], [1,1,0], etc.

To reproduce

Steps to reproduce the behavior:

  1. load the GPT2Tokenizer, TFGPT2ForSequenceClassification with num_labels=3
my_gpt_tokenizer = GPT2TokenizerFast.from_pretrained('openai-gpt')
my_gpt_model = TFGPT2ForSequenceClassification.from_pretrained('openai-gpt',num_labels=3)
  1. add pad token to the tokenizer, tokenize the text as the tutorials did and transfer them into dataset objects
my_gpt_tokenizer.add_special_tokens({'pad_token': '[PAD]'})
gpt_train_encodings = my_gpt_tokenizer(X_train, truncation=True, padding=True)
gpt_test_encodings = my_gpt_tokenizer(X_test, truncation=True, padding=True)
gpt_train_dataset = tf.data.Dataset.from_tensor_slices((dict(gpt_train_encodings),y_train))
gpt_test_dataset = tf.data.Dataset.from_tensor_slices((dict(gpt_test_encodings),y_test))
  1. train the model:
optimizer = tf.keras.optimizers.Adam(learning_rate=5e-5)
my_gpt_model.compile(optimizer=optimizer, loss="binary_crossentropy", metrics=['accuracy']) 
history = my_gpt_model.fit(gpt_train_dataset.shuffle(500).batch(10), epochs=2, batch_size=10, validation_data=gpt_test_dataset.batch(10))

Expected behavior

The model should be trained successfully as the Bert classification does. I tried the same code on TFBertForSequenceClassification and TFDistilBertForSequenceClassification, which are all successful.

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Reactions:1
  • Comments:5 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
Rocketknight1commented, May 3, 2021

Hi @cytwill, can you share a few lines of the data you’re loading as X_train and y_train? If it’s a private dataset, you can replace the text with random text - I just want to see the format of the data and try to reproduce the error here.

0reactions
rcmcabralcommented, Feb 23, 2022

Hi. I am currently experiencing the same issue as the OP where the classification layer seems to be inserted before the main GPT layer. I basically have the same model summary and a similar error so I thought I’d try to reopen this.

I know it’s not an ideal dataset for the model but here’s a copy of the Fine Tuning with Keras tutorial to illustrate the problem: https://colab.research.google.com/drive/1UJdB5QG_6L1qeWxM8Fa-CuDZQR32cshL?usp=sharing

Below the tensorflow implementation is the pytorch version that seems to work well enough.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Implement a TF2 version of `GPT2ForSequenceClassification`
Below is a list of items to follow to make sure the integration is complete: Implement TFGPT2ForSequenceClassification in modeling_tf_gpt2.py ...
Read more >
OpenAI GPT2 — transformers 3.5.0 documentation
The GPT2 Model transformer with a sequence classification head on top (linear layer). GPT2ForSequenceClassification uses the last token in order to do the ......
Read more >
Which model (GPT2, BERT, XLNet and etc) would you use for ...
I'm trying to train a model for a sentence classification task. The input is a sentence (a vector of integers) and the output...
Read more >
Tutorial: Text Classification using GPT2 and Pytorch - YouTube
Text classification is a very common problem that needs solving when dealing with text data. We've all seen and know how to use...
Read more >
Static Malware Detection Using Stacked BiLSTM and GPT-2
This article has been accepted for publication in a future issue of this journal, ... [12] applied TF and TF-IDF representations for each...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found