Issues with TFGPT2ForSequenceClassification
See original GitHub issueEnvironment info
transformers
version: 4.5.1- Platform: Google Colab
- Python version:
- PyTorch version (GPU?):
- Tensorflow version (GPU?): 2.4.1
- Using GPU in script?: NO, but tf automatically use it
- Using distributed or parallel set-up in script?:
Who can help
@patrickvonplaten, @LysandreJik, @Rocketknight1
Information
Model I am using (GPT2):
The problem arises when using:
-
my own modified scripts: (give details below) When using TFGPT2ForSequenceClassification, I found that the structure of the model is weird, see below: Why is the classifier inserted before the GPT main layer? And when I load the PyTorch version, it looks different (inserted after the main layer): Also, I tried to train this model as the tutorials of fine-tuning on Bert with customized dataset suggests, but failed as following, I loaded the pretrained classification model with 3 classes: ValueError: in user code:
/usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/engine/training.py:805 train_function * return step_function(self, iterator) /usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/engine/training.py:795 step_function ** outputs = model.distribute_strategy.run(run_step, args=(data,)) /usr/local/lib/python3.7/dist-packages/tensorflow/python/distribute/distribute_lib.py:1259 run return self._extended.call_for_each_replica(fn, args=args, kwargs=kwargs) /usr/local/lib/python3.7/dist-packages/tensorflow/python/distribute/distribute_lib.py:2730 call_for_each_replica return self._call_for_each_replica(fn, args, kwargs) /usr/local/lib/python3.7/dist-packages/tensorflow/python/distribute/distribute_lib.py:3417 _call_for_each_replica return fn(*args, **kwargs) /usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/engine/training.py:788 run_step ** outputs = model.train_step(data) /usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/engine/training.py:758 train_step self.compiled_metrics.update_state(y, y_pred, sample_weight) /usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/engine/compile_utils.py:408 update_state metric_obj.update_state(y_t, y_p, sample_weight=mask) /usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/utils/metrics_utils.py:90 decorated update_op = update_state_fn(*args, **kwargs) /usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/metrics.py:177 update_state_fn return ag_update_state(*args, **kwargs) /usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/metrics.py:618 update_state ** matches = ag_fn(y_true, y_pred, **self._fn_kwargs) /usr/local/lib/python3.7/dist-packages/tensorflow/python/util/dispatch.py:201 wrapper return target(*args, **kwargs) /usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/metrics.py:3315 sparse_categorical_accuracy return math_ops.cast(math_ops.equal(y_true, y_pred), K.floatx()) /usr/local/lib/python3.7/dist-packages/tensorflow/python/util/dispatch.py:201 wrapper return target(*args, **kwargs) /usr/local/lib/python3.7/dist-packages/tensorflow/python/ops/math_ops.py:1679 equal return gen_math_ops.equal(x, y, name=name) /usr/local/lib/python3.7/dist-packages/tensorflow/python/ops/gen_math_ops.py:3179 equal name=name) /usr/local/lib/python3.7/dist-packages/tensorflow/python/framework/op_def_library.py:750 _apply_op_helper attrs=attr_protos, op_def=op_def) /usr/local/lib/python3.7/dist-packages/tensorflow/python/framework/func_graph.py:592 _create_op_internal compute_device) /usr/local/lib/python3.7/dist-packages/tensorflow/python/framework/ops.py:3536 _create_op_internal op_def=op_def) /usr/local/lib/python3.7/dist-packages/tensorflow/python/framework/ops.py:2016 init control_input_ops, op_def) /usr/local/lib/python3.7/dist-packages/tensorflow/python/framework/ops.py:1856 _create_c_op raise ValueError(str(e))
ValueError: Dimensions must be equal, but are 3 and 512 for ‘{{node Equal}} = Equal[T=DT_FLOAT, incompatible_shape_error=true](Cast_1, Cast_2)’ with input shapes: [?,3], [?,512].
The tasks I am working on is:
- my own task or dataset: (give details below) The task is a multi-label classification task, where the label of each sample could be represented as a 3-dim vector like [0,0,0], [0,1,0], [1,1,0], etc.
To reproduce
Steps to reproduce the behavior:
- load the GPT2Tokenizer, TFGPT2ForSequenceClassification with num_labels=3
my_gpt_tokenizer = GPT2TokenizerFast.from_pretrained('openai-gpt')
my_gpt_model = TFGPT2ForSequenceClassification.from_pretrained('openai-gpt',num_labels=3)
- add pad token to the tokenizer, tokenize the text as the tutorials did and transfer them into dataset objects
my_gpt_tokenizer.add_special_tokens({'pad_token': '[PAD]'})
gpt_train_encodings = my_gpt_tokenizer(X_train, truncation=True, padding=True)
gpt_test_encodings = my_gpt_tokenizer(X_test, truncation=True, padding=True)
gpt_train_dataset = tf.data.Dataset.from_tensor_slices((dict(gpt_train_encodings),y_train))
gpt_test_dataset = tf.data.Dataset.from_tensor_slices((dict(gpt_test_encodings),y_test))
- train the model:
optimizer = tf.keras.optimizers.Adam(learning_rate=5e-5)
my_gpt_model.compile(optimizer=optimizer, loss="binary_crossentropy", metrics=['accuracy'])
history = my_gpt_model.fit(gpt_train_dataset.shuffle(500).batch(10), epochs=2, batch_size=10, validation_data=gpt_test_dataset.batch(10))
Expected behavior
The model should be trained successfully as the Bert classification does. I tried the same code on TFBertForSequenceClassification and TFDistilBertForSequenceClassification, which are all successful.
Issue Analytics
- State:
- Created 2 years ago
- Reactions:1
- Comments:5 (3 by maintainers)
Top GitHub Comments
Hi @cytwill, can you share a few lines of the data you’re loading as X_train and y_train? If it’s a private dataset, you can replace the text with random text - I just want to see the format of the data and try to reproduce the error here.
Hi. I am currently experiencing the same issue as the OP where the classification layer seems to be inserted before the main GPT layer. I basically have the same model summary and a similar error so I thought I’d try to reopen this.
I know it’s not an ideal dataset for the model but here’s a copy of the Fine Tuning with Keras tutorial to illustrate the problem: https://colab.research.google.com/drive/1UJdB5QG_6L1qeWxM8Fa-CuDZQR32cshL?usp=sharing
Below the tensorflow implementation is the pytorch version that seems to work well enough.