Fine tuning TensorFlow DeBERTa fails on TPU
See original GitHub issueSystem Info
Latest version of transformers, Colab TPU, tensorflow 2.
- Colab TPU
- transformers: 4.21.0
- tensorflow: 2.8.2 / 2.6.2
- Python 3.7
Who can help?
@LysandreJik, @Rocketknight1, @san
Information
- The official example scripts
- My own modified scripts
Tasks
- An officially supported task in the
examples
folder (such as GLUE/SQuAD, …) - My own task or dataset (give details below)
Reproduction
I am facing some issues while trying to fine-tune a TensorFlow DeBERTa model microsoft/deberta-v3-base
on TPU.
I have created some Colab notebooks showing the errors. Note, the second and third notebooks already include some measures to circumvent previous errors.
- ValueError with partially known TensorShape with latest
take_along_axis
change: FineTuning_TF_DeBERTa_TPU_1 - Output shape mismatch of branches with custom dropout: FineTuning_TF_DeBERTa_TPU_2
- XLA compilation error because of dynamic/computed tensor shapes: FineTuning_TF_DeBERTa_TPU_3
I have seen similar issues when using microsoft/deberta-base
.
I believe the following issues are related:
- TF2 DeBERTaV2 runs super slow on TPUs #18239
- Debertav2 debertav3 TPU : socket closed #18276. From this I used the fix on
take_along_axis
.
Thanks!
Expected behavior
Fine tuning is possible as it happens when using a GPU.
Issue Analytics
- State:
- Created a year ago
- Comments:17 (12 by maintainers)
Top Results From Across the Web
Fine-tune a pretrained model - Hugging Face
In this tutorial, you will fine-tune a pretrained model with a deep learning framework of your choice: Fine-tune a pretrained model with Transformers...
Read more >NBME AlBERT Large Training TPU - Kaggle
Secondly the whole model, including the pretrained AlBERT model, is fine tuned. This notebook first defines the model, followed by the data pipeline....
Read more >python - ValueError: No gradients provided for any variable ...
I am using 'microsoft/deberta-v3-base' model for fine tuning. ... output) from transformers import create_optimizer import tensorflow as tf ...
Read more >Fine-tuning a BERT model | Text - TensorFlow
If you're just trying to fine-tune a model, the TF Hub tutorial is a good ... "gs://cloud-tpu-checkpoints/bert/v3/uncased_L-12_H-768_A-12"
Read more >TPUStrategy.run(fn, …) does not support pure eager ...
logger.warning('Failed initializing TPU! Running on GPU'). batch_size = 16. class Dora_A(tf.keras.Model):. def __init__(self, **kwargs):.
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
@gante
Great, setting the
batch_size
works 🥳. I only had to make sure that it divides thestrategy.num_replicas_in_sync
, FineTuning_TF_DeBERTa_Working_Fix_TPU. Thanks a lot, I will test the procedure now on my real use case at hand.Weird!
During my TPU and GPU tests, i was using a custom training loop instead of keras’s
.fit()
, which I’m not sure if it actually matters.In my custom training code, I got deberta to train in an electra style training, with XLA enabled with
jit_compile=True
with non of the issues mentioned above.I will be sharing my code asap once I finish the pretraining and validate the results. It is based on Nvidia BERT and Electra TF2 training code https://github.com/NVIDIA/DeepLearningExamples/tree/master/TensorFlow2/LanguageModeling/