question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Fine tuning TensorFlow DeBERTa fails on TPU

See original GitHub issue

System Info

Latest version of transformers, Colab TPU, tensorflow 2.

  • Colab TPU
  • transformers: 4.21.0
  • tensorflow: 2.8.2 / 2.6.2
  • Python 3.7

Who can help?

@LysandreJik, @Rocketknight1, @san

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, …)
  • My own task or dataset (give details below)

Reproduction

I am facing some issues while trying to fine-tune a TensorFlow DeBERTa model microsoft/deberta-v3-base on TPU.

I have created some Colab notebooks showing the errors. Note, the second and third notebooks already include some measures to circumvent previous errors.

I have seen similar issues when using microsoft/deberta-base.

I believe the following issues are related:

Thanks!

Expected behavior

Fine tuning is possible as it happens when using a GPU.

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:17 (12 by maintainers)

github_iconTop GitHub Comments

2reactions
tmorodercommented, Aug 10, 2022

@gante

Great, setting the batch_size works 🥳. I only had to make sure that it divides the strategy.num_replicas_in_sync, FineTuning_TF_DeBERTa_Working_Fix_TPU. Thanks a lot, I will test the procedure now on my real use case at hand.

1reaction
WissamAntouncommented, Aug 5, 2022

Weird!

During my TPU and GPU tests, i was using a custom training loop instead of keras’s .fit(), which I’m not sure if it actually matters.

In my custom training code, I got deberta to train in an electra style training, with XLA enabled with jit_compile=True with non of the issues mentioned above.

I will be sharing my code asap once I finish the pretraining and validate the results. It is based on Nvidia BERT and Electra TF2 training code https://github.com/NVIDIA/DeepLearningExamples/tree/master/TensorFlow2/LanguageModeling/

Read more comments on GitHub >

github_iconTop Results From Across the Web

Fine-tune a pretrained model - Hugging Face
In this tutorial, you will fine-tune a pretrained model with a deep learning framework of your choice: Fine-tune a pretrained model with Transformers...
Read more >
NBME AlBERT Large Training TPU - Kaggle
Secondly the whole model, including the pretrained AlBERT model, is fine tuned. This notebook first defines the model, followed by the data pipeline....
Read more >
python - ValueError: No gradients provided for any variable ...
I am using 'microsoft/deberta-v3-base' model for fine tuning. ... output) from transformers import create_optimizer import tensorflow as tf ...
Read more >
Fine-tuning a BERT model | Text - TensorFlow
If you're just trying to fine-tune a model, the TF Hub tutorial is a good ... "gs://cloud-tpu-checkpoints/bert/v3/uncased_L-12_H-768_A-12"
Read more >
TPUStrategy.run(fn, …) does not support pure eager ...
logger.warning('Failed initializing TPU! Running on GPU'). batch_size = 16. class Dora_A(tf.keras.Model):. def __init__(self, **kwargs):.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found