Dev Observability
Product
Pricing
Docs
Resources
Blog
Company
Debug Wordle

question-mark

Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

How to set appropriate learning rate ?

See original GitHub issue

vit = ViT( image_size=448, patch_size=32, num_classes=180, dim=1024, depth=8, heads=8, mlp_dim=2048, dropout=0.5, emb_dropout=0.5 ).cuda() optimizer = torch.optim.Adam(vit.parameters(), lr=5e-3, weight_decay=0.1) I tried to train ViT on a 180-class dataset and used the shown config but loss doesn’t descend during training. Any suggestion to solve ?

Issue Analytics

State:
Created 3 years ago
Reactions:1
Comments:7 (4 by maintainers)

Top GitHub Comments

2reactions

lucidrainscommented, Oct 20, 2020

@sleeplessai I believe your dropout is way too high, I would set it to be 0.1 at most

0reactions

lucidrainscommented, Oct 21, 2020

@sleeplessai You could also just wait until the paper has been reviewed, and fine-tune from the pre-trained model once that is released, probably by google

Read more comments on GitHub >

Top Results From Across the Web

How to Configure the Learning Rate When Training Deep ...

An alternative approach is to perform a sensitivity analysis of the learning rate for the chosen model, also called a grid search. This...

How to Decide on Learning Rate - Towards Data Science

Whenever one is starting with a new architecture or dataset, a single LR range test provides both a good LR value and a...

Setting the learning rate of your neural network. - Jeremy Jordan

Another commonly employed technique, known as learning rate annealing, recommends starting with a relatively high learning rate and then ...

The Learning Rate - Andrea Perlato

Instead, a good (or good enough) learning rate must be discovered via trial and error. The range of values to consider for the...

Choosing a learning rate - Data Science Stack Exchange

Setting learning rates for plain SGD in neural nets is usually a process of starting with a sane value such as 0.01 and...

Top Related Medium Post

No results found

Top Related StackOverflow Question

No results found

Troubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.

Top Related Reddit Thread

No results found

Top Related Hackernoon Post

No results found

Top Related Tweet

No results found

Top Related Dev.to Post

No results found

Top Related Hashnode Post

No results found

Trained on small dataset with pre-trained weight, don't have good result.

Using masks as preprocessing for classification [FR]