question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Large models aren't converge while fine-tuning

See original GitHub issue

I tried to fine-tune XLM-Roberta Large model on Google Colab environment for 3 epochs using 1e-5 learning rate, 16 batch size, 2 accumulative steps and 120 warmup steps. But the loss didn’t converge and the model gives random predictions after the fine-tuning.

I used the sentence pair minimal example as a starting point.

Do you have any idea?

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Reactions:4
  • Comments:19 (11 by maintainers)

github_iconTop GitHub Comments

1reaction
antonyscerricommented, Feb 10, 2020

in my case carefully adjusting the learning rate (with the existing scheduler) along with the number of epochs (as well as the previous increase in batch size) allowed me to get much better results (beating those with the base model to date). So there doesn’t appear to be anything fundamentally wrong with the pre-trained model or core model code. Seems you need to do a broader sweep of the parameters in your case (assuming no data issues etc).

1reaction
kinoutecommented, Jan 29, 2020

I understand that. But you should add checkpoints within epochs. This won’t improve your model but it will give you more insights on what’s going on really since you will get way more metrics at different moments of the process.

I’m about to re-run XLM Roberta on my binary classifier and I will report what worked for me.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Advanced Techniques for Fine-tuning Transformers
Learn these techniques for fine-tuning BERT, RoBERTa, etc. Layer-wise Learning Rate Decay (LLRD) Warm-up Steps Re-initializing Layers ...
Read more >
Transfer learning and fine-tuning | TensorFlow Core
It is critical to only do this step after the model with frozen layers has been trained to convergence. If you mix randomly-initialized ......
Read more >
How To Fit a Bigger Model and Train It Faster - Hugging Face
However, a larger batch size can often result in faster model convergence or better end performance. So ideally we want to tune the...
Read more >
Fine-tuning your model | Chan`s Jupyter
C C C controls the inverse of the regularization strength, and this is what you will tune in this exercise. A large C...
Read more >
Models that converged before aren't converging anymore in ...
I can even load the saved model and weights that work. When I train more with the exact same model, the performance actually...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found