How to set appropriate learning rate ?
See original GitHub issuevit = ViT( image_size=448, patch_size=32, num_classes=180, dim=1024, depth=8, heads=8, mlp_dim=2048, dropout=0.5, emb_dropout=0.5 ).cuda()
optimizer = torch.optim.Adam(vit.parameters(), lr=5e-3, weight_decay=0.1)
I tried to train ViT on a 180-class dataset and used the shown config but loss doesn’t descend during training.
Any suggestion to solve ?
Issue Analytics
- State:
- Created 3 years ago
- Reactions:1
- Comments:7 (4 by maintainers)
Top Results From Across the Web
How to Configure the Learning Rate When Training Deep ...
An alternative approach is to perform a sensitivity analysis of the learning rate for the chosen model, also called a grid search. This...
Read more >How to Decide on Learning Rate - Towards Data Science
Whenever one is starting with a new architecture or dataset, a single LR range test provides both a good LR value and a...
Read more >Setting the learning rate of your neural network. - Jeremy Jordan
Another commonly employed technique, known as learning rate annealing, recommends starting with a relatively high learning rate and then ...
Read more >The Learning Rate - Andrea Perlato
Instead, a good (or good enough) learning rate must be discovered via trial and error. The range of values to consider for the...
Read more >Choosing a learning rate - Data Science Stack Exchange
Setting learning rates for plain SGD in neural nets is usually a process of starting with a sane value such as 0.01 and...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
@sleeplessai I believe your dropout is way too high, I would set it to be
0.1
at most@sleeplessai You could also just wait until the paper has been reviewed, and fine-tune from the pre-trained model once that is released, probably by google