Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Large models aren't converge while fine-tuning

See original GitHub issue

I tried to fine-tune XLM-Roberta Large model on Google Colab environment for 3 epochs using 1e-5 learning rate, 16 batch size, 2 accumulative steps and 120 warmup steps. But the loss didn’t converge and the model gives random predictions after the fine-tuning.

I used the sentence pair minimal example as a starting point.

Do you have any idea?

Issue Analytics

State:
Created 4 years ago
Reactions:4
Comments:19 (11 by maintainers)

Top GitHub Comments

1reaction

antonyscerricommented, Feb 10, 2020

in my case carefully adjusting the learning rate (with the existing scheduler) along with the number of epochs (as well as the previous increase in batch size) allowed me to get much better results (beating those with the base model to date). So there doesn’t appear to be anything fundamentally wrong with the pre-trained model or core model code. Seems you need to do a broader sweep of the parameters in your case (assuming no data issues etc).

1reaction

kinoutecommented, Jan 29, 2020

I understand that. But you should add checkpoints within epochs. This won’t improve your model but it will give you more insights on what’s going on really since you will get way more metrics at different moments of the process.

I’m about to re-run XLM Roberta on my binary classifier and I will report what worked for me.

Top Results From Across the Web

Advanced Techniques for Fine-tuning Transformers

Learn these techniques for fine-tuning BERT, RoBERTa, etc. Layer-wise Learning Rate Decay (LLRD) Warm-up Steps Re-initializing Layers ...

Transfer learning and fine-tuning | TensorFlow Core

It is critical to only do this step after the model with frozen layers has been trained to convergence. If you mix randomly-initialized ......

How To Fit a Bigger Model and Train It Faster - Hugging Face

However, a larger batch size can often result in faster model convergence or better end performance. So ideally we want to tune the...

Fine-tuning your model | Chan`s Jupyter

C C C controls the inverse of the regularization strength, and this is what you will tune in this exercise. A large C...

Models that converged before aren't converging anymore in ...

I can even load the saved model and weights that work. When I train more with the exact same model, the performance actually...

Troubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.

Start Free

Top Related Reddit Thread

No results found

Top Related Tweet

No results found

Top Related Dev.to Post

No results found

Large models aren't converge while fine-tuning

Issue Analytics

Top GitHub Comments

Top Results From Across the Web

Top Related Medium Post

Top Related StackOverflow Question

Troubleshoot Live Code

Top Related Reddit Thread

Top Related Hackernoon Post

Top Related Tweet

Top Related Dev.to Post

Top Related Hashnode Post

Unable to use predict() for sentence classification task

Model Prediction call stuck at convert_examples_to_features when using `use_multiprocessing=True`