question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

How to set appropriate learning rate ?

See original GitHub issue

vit = ViT( image_size=448, patch_size=32, num_classes=180, dim=1024, depth=8, heads=8, mlp_dim=2048, dropout=0.5, emb_dropout=0.5 ).cuda() optimizer = torch.optim.Adam(vit.parameters(), lr=5e-3, weight_decay=0.1) I tried to train ViT on a 180-class dataset and used the shown config but loss doesn’t descend during training. Any suggestion to solve ?

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Reactions:1
  • Comments:7 (4 by maintainers)

github_iconTop GitHub Comments

2reactions
lucidrainscommented, Oct 20, 2020

@sleeplessai I believe your dropout is way too high, I would set it to be 0.1 at most

0reactions
lucidrainscommented, Oct 21, 2020

@sleeplessai You could also just wait until the paper has been reviewed, and fine-tune from the pre-trained model once that is released, probably by google

Read more comments on GitHub >

github_iconTop Results From Across the Web

How to Configure the Learning Rate When Training Deep ...
An alternative approach is to perform a sensitivity analysis of the learning rate for the chosen model, also called a grid search. This...
Read more >
How to Decide on Learning Rate - Towards Data Science
Whenever one is starting with a new architecture or dataset, a single LR range test provides both a good LR value and a...
Read more >
Setting the learning rate of your neural network. - Jeremy Jordan
Another commonly employed technique, known as learning rate annealing, recommends starting with a relatively high learning rate and then ...
Read more >
The Learning Rate - Andrea Perlato
Instead, a good (or good enough) learning rate must be discovered via trial and error. The range of values to consider for the...
Read more >
Choosing a learning rate - Data Science Stack Exchange
Setting learning rates for plain SGD in neural nets is usually a process of starting with a sane value such as 0.01 and...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found