question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

How to use Adam instead of SGD?

See original GitHub issue

All the optimizers are defined as: optimizer = dict(type='SGD', lr=2e-3, momentum=0.9, weight_decay=5e-4) But I want to change it to Adam, how should I do ? Can anyone give me an example?Thx!

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:5

github_iconTop GitHub Comments

4reactions
StringBottlecommented, Dec 17, 2019

@GYee

As far as I used Adam instead of SGD, expecting faster training speed, the loss diverged to larger than 1000 so I stopped the training. I haven’t got a satisfactory training result, but at least training was stable using SGD.

0reactions
ZwwWaynecommented, Mar 16, 2020

The loss diverged simply because the learning rate=0.02 is too large for Adam. Try 1e-3 or 3e-4 and you will get reasonable results. However, the results are still much lower than that of using SGD (bbox mAP 31.1 and segm mAP 28.6 with 3e-4 on Mask R-CNN using ResNet-50). We suggest more hyper-parameter tuning when using Adam if Adam is inevitable in your experiments.

Read more comments on GitHub >

github_iconTop Results From Across the Web

A 2021 Guide to improving CNNs-Optimizers: Adam vs SGD
For now, we could say that fine-tuned Adam is always better than SGD, while there exists a performance gap between Adam and SGD...
Read more >
Why not always use the ADAM optimization technique?
Adam is faster to converge. SGD is slower but generalizes better. So at the end it all depends on your particular circumstances.
Read more >
Adam vs. SGD: Closing the generalization gap on image ...
1 Adam finds solutions that generalize worse than those found by SGD [3, 4, 6]. Even when Adam achieves the same or lower...
Read more >
Why Should Adam Optimizer Not Be the Default Learning ...
To summarize, Adam definitely converges rapidly to a “sharp minima” whereas SGD is computationally heavy, converges to a “flat minima” but ...
Read more >
Deep Learning Optimizers. SGD with momentum, Adagrad ...
Adam optimizer is by far one of the most preferred optimizers. The idea behind Adam optimizer is to utilize the momentum concept from...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found