Dev Observability
Product
Pricing
Docs
Resources
Blog
Company
Debug Wordle

question-mark

Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

How to use Adam instead of SGD?

See original GitHub issue

All the optimizers are defined as: optimizer = dict(type='SGD', lr=2e-3, momentum=0.9, weight_decay=5e-4) But I want to change it to Adam, how should I do ? Can anyone give me an example?Thx!

Issue Analytics

State:
Created 4 years ago
Comments:5

Top GitHub Comments

4reactions

StringBottlecommented, Dec 17, 2019

As far as I used Adam instead of SGD, expecting faster training speed, the loss diverged to larger than 1000 so I stopped the training. I haven’t got a satisfactory training result, but at least training was stable using SGD.

0reactions

ZwwWaynecommented, Mar 16, 2020

The loss diverged simply because the learning rate=0.02 is too large for Adam. Try 1e-3 or 3e-4 and you will get reasonable results. However, the results are still much lower than that of using SGD (bbox mAP 31.1 and segm mAP 28.6 with 3e-4 on Mask R-CNN using ResNet-50). We suggest more hyper-parameter tuning when using Adam if Adam is inevitable in your experiments.

Read more comments on GitHub >

Top Results From Across the Web

A 2021 Guide to improving CNNs-Optimizers: Adam vs SGD

For now, we could say that fine-tuned Adam is always better than SGD, while there exists a performance gap between Adam and SGD...

Why not always use the ADAM optimization technique?

Adam is faster to converge. SGD is slower but generalizes better. So at the end it all depends on your particular circumstances.

Adam vs. SGD: Closing the generalization gap on image ...

1 Adam finds solutions that generalize worse than those found by SGD [3, 4, 6]. Even when Adam achieves the same or lower...

Why Should Adam Optimizer Not Be the Default Learning ...

To summarize, Adam definitely converges rapidly to a “sharp minima” whereas SGD is computationally heavy, converges to a “flat minima” but ...

Deep Learning Optimizers. SGD with momentum, Adagrad ...

Adam optimizer is by far one of the most preferred optimizers. The idea behind Adam optimizer is to utilize the momentum concept from...

Top Related Medium Post

No results found

Top Related StackOverflow Question

No results found

Troubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.

Top Related Reddit Thread

No results found

Top Related Hackernoon Post

No results found

Top Related Tweet

No results found

Top Related Dev.to Post

No results found

Top Related Hashnode Post

No results found

AttributeError: 'NoneType' object has no attribute 'shape'

mmdetection/mmdet/ops/nms/nms_cpu.cpython-37m-x86_64-linux-gnu.so: undefined symbol: _ZNK2at11ATenOpTable11reportErrorEN3c1012TensorTypeIdE