Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Fine tuning TensorFlow DeBERTa fails on TPU

See original GitHub issue

System Info

Latest version of transformers, Colab TPU, tensorflow 2.

Colab TPU
transformers: 4.21.0
tensorflow: 2.8.2 / 2.6.2
Python 3.7

Who can help?

@LysandreJik, @Rocketknight1, @san

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, …)
My own task or dataset (give details below)

Reproduction

I am facing some issues while trying to fine-tune a TensorFlow DeBERTa model microsoft/deberta-v3-base on TPU.

I have created some Colab notebooks showing the errors. Note, the second and third notebooks already include some measures to circumvent previous errors.

ValueError with partially known TensorShape with latest take_along_axis change: FineTuning_TF_DeBERTa_TPU_1
Output shape mismatch of branches with custom dropout: FineTuning_TF_DeBERTa_TPU_2
XLA compilation error because of dynamic/computed tensor shapes: FineTuning_TF_DeBERTa_TPU_3

I have seen similar issues when using microsoft/deberta-base.

I believe the following issues are related:

TF2 DeBERTaV2 runs super slow on TPUs #18239
Debertav2 debertav3 TPU : socket closed #18276. From this I used the fix on take_along_axis.

Thanks!

Expected behavior

Fine tuning is possible as it happens when using a GPU.

Issue Analytics

State:
Created a year ago
Comments:17 (12 by maintainers)

Top GitHub Comments

2reactions

tmorodercommented, Aug 10, 2022

@gante

Great, setting the batch_size works 🥳. I only had to make sure that it divides the strategy.num_replicas_in_sync, FineTuning_TF_DeBERTa_Working_Fix_TPU. Thanks a lot, I will test the procedure now on my real use case at hand.

1reaction

WissamAntouncommented, Aug 5, 2022

Weird!

During my TPU and GPU tests, i was using a custom training loop instead of keras’s .fit(), which I’m not sure if it actually matters.

In my custom training code, I got deberta to train in an electra style training, with XLA enabled with jit_compile=True with non of the issues mentioned above.

I will be sharing my code asap once I finish the pretraining and validate the results. It is based on Nvidia BERT and Electra TF2 training code https://github.com/NVIDIA/DeepLearningExamples/tree/master/TensorFlow2/LanguageModeling/

Top Results From Across the Web

Fine-tune a pretrained model - Hugging Face

In this tutorial, you will fine-tune a pretrained model with a deep learning framework of your choice: Fine-tune a pretrained model with Transformers...

NBME AlBERT Large Training TPU - Kaggle

Secondly the whole model, including the pretrained AlBERT model, is fine tuned. This notebook first defines the model, followed by the data pipeline....

python - ValueError: No gradients provided for any variable ...

I am using 'microsoft/deberta-v3-base' model for fine tuning. ... output) from transformers import create_optimizer import tensorflow as tf ...

Fine-tuning a BERT model | Text - TensorFlow

If you're just trying to fine-tune a model, the TF Hub tutorial is a good ... "gs://cloud-tpu-checkpoints/bert/v3/uncased_L-12_H-768_A-12"

TPUStrategy.run(fn, …) does not support pure eager ...

logger.warning('Failed initializing TPU! Running on GPU'). batch_size = 16. class Dora_A(tf.keras.Model):. def __init__(self, **kwargs):.