question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Learning Rate is increasing instead of annealing

See original GitHub issue

❓ Questions and Help

Hey! Amazing work, I’ve struggled couple of times with tuning previous version of detectron, this new write-up is working really well, and gives ability to tune with minimum amount of time and energy.

A question is: why the learning rate is increasing instead of annealing step-by-step? As shown below:

[10/24 10:09:06 d2.engine.train_loop]: Starting training from iteration 0
[10/24 10:09:18 d2.utils.events]: eta: 0:18:36  iter: 19  total_loss: 1.902  loss_cls: 1.429  loss_box_reg: 0.487  time: 0.5684  data_time: 0.0405  lr: 0.000002  max_mem: 10456M
[10/24 10:09:29 d2.utils.events]: eta: 0:18:28  iter: 39  total_loss: 1.930  loss_cls: 1.435  loss_box_reg: 0.491  time: 0.5692  data_time: 0.0068  lr: 0.000004  max_mem: 10456M
[10/24 10:09:41 d2.utils.events]: eta: 0:18:17  iter: 59  total_loss: 1.896  loss_cls: 1.399  loss_box_reg: 0.500  time: 0.5695  data_time: 0.0078  lr: 0.000006  max_mem: 10456M
[10/24 10:09:52 d2.utils.events]: eta: 0:18:06  iter: 79  total_loss: 1.918  loss_cls: 1.416  loss_box_reg: 0.494  time: 0.5697  data_time: 0.0060  lr: 0.000008  max_mem: 10456M
[10/24 10:10:04 d2.utils.events]: eta: 0:17:55  iter: 99  total_loss: 1.888  loss_cls: 1.392  loss_box_reg: 0.496  time: 0.5695  data_time: 0.0071  lr: 0.000010  max_mem: 10456M
[10/24 10:10:15 d2.utils.events]: eta: 0:17:44  iter: 119  total_loss: 1.876  loss_cls: 1.382  loss_box_reg: 0.493  time: 0.5699  data_time: 0.0072  lr: 0.000012  max_mem: 10456M
[10/24 10:10:27 d2.utils.events]: eta: 0:17:34  iter: 139  total_loss: 1.894  loss_cls: 1.398  loss_box_reg: 0.485  time: 0.5705  data_time: 0.0064  lr: 0.000014  max_mem: 10456M
[10/24 10:10:38 d2.utils.events]: eta: 0:17:23  iter: 159  total_loss: 1.939  loss_cls: 1.448  loss_box_reg: 0.488  time: 0.5702  data_time: 0.0068  lr: 0.000016  max_mem: 10458M
[10/24 10:10:49 d2.utils.events]: eta: 0:17:12  iter: 179  total_loss: 1.910  loss_cls: 1.422  loss_box_reg: 0.502  time: 0.5701  data_time: 0.0068  lr: 0.000018  max_mem: 10458M
[10/24 10:11:01 d2.utils.events]: eta: 0:17:02  iter: 199  total_loss: 1.918  loss_cls: 1.435  loss_box_reg: 0.489  time: 0.5711  data_time: 0.0069  lr: 0.000020  max_mem: 10458M
[10/24 10:11:13 d2.utils.events]: eta: 0:16:51  iter: 219  total_loss: 1.922  loss_cls: 1.442  loss_box_reg: 0.478  time: 0.5716  data_time: 0.0069  lr: 0.000022  max_mem: 10458M
[10/24 10:11:24 d2.utils.events]: eta: 0:16:39  iter: 239  total_loss: 1.935  loss_cls: 1.441  loss_box_reg: 0.494  time: 0.5722  data_time: 0.0069  lr: 0.000024  max_mem: 10458M
[10/24 10:11:36 d2.utils.events]: eta: 0:16:28  iter: 259  total_loss: 1.924  loss_cls: 1.430  loss_box_reg: 0.495  time: 0.5722  data_time: 0.0106  lr: 0.000026  max_mem: 10458M
[10/24 10:11:47 d2.utils.events]: eta: 0:16:18  iter: 279  total_loss: 1.903  loss_cls: 1.400  loss_box_reg: 0.497  time: 0.5728  data_time: 0.0068  lr: 0.000028  max_mem: 10458M
[10/24 10:11:59 d2.utils.events]: eta: 0:16:07  iter: 299  total_loss: 1.929  loss_cls: 1.437  loss_box_reg: 0.486  time: 0.5733  data_time: 0.0068  lr: 0.000030  max_mem: 10458M
[10/24 10:12:10 d2.utils.events]: eta: 0:15:56  iter: 319  total_loss: 1.956  loss_cls: 1.467  loss_box_reg: 0.479  time: 0.5731  data_time: 0.0069  lr: 0.000032  max_mem: 10458M
[10/24 10:12:22 d2.utils.events]: eta: 0:15:44  iter: 339  total_loss: 1.910  loss_cls: 1.429  loss_box_reg: 0.491  time: 0.5736  data_time: 0.0068  lr: 0.000034  max_mem: 10458M
[10/24 10:12:33 d2.utils.events]: eta: 0:15:33  iter: 359  total_loss: 1.904  loss_cls: 1.409  loss_box_reg: 0.483  time: 0.5734  data_time: 0.0068  lr: 0.000036  max_mem: 10458M
[10/24 10:12:45 d2.utils.events]: eta: 0:15:22  iter: 379  total_loss: 1.951  loss_cls: 1.463  loss_box_reg: 0.488  time: 0.5735  data_time: 0.0067  lr: 0.000038  max_mem: 10458M
[10/24 10:12:56 d2.utils.events]: eta: 0:15:11  iter: 399  total_loss: 1.918  loss_cls: 1.423  loss_box_reg: 0.484  time: 0.5739  data_time: 0.0067  lr: 0.000040  max_mem: 10458M
[10/24 10:13:08 d2.utils.events]: eta: 0:15:00  iter: 419  total_loss: 1.881  loss_cls: 1.418  loss_box_reg: 0.490  time: 0.5743  data_time: 0.0067  lr: 0.000042  max_mem: 10458M
[10/24 10:13:20 d2.utils.events]: eta: 0:14:49  iter: 439  total_loss: 1.878  loss_cls: 1.404  loss_box_reg: 0.486  time: 0.5747  data_time: 0.0067  lr: 0.000044  max_mem: 10458M
[10/24 10:13:31 d2.utils.events]: eta: 0:14:37  iter: 459  total_loss: 1.890  loss_cls: 1.393  loss_box_reg: 0.489  time: 0.5749  data_time: 0.0069  lr: 0.000046  max_mem: 10458M
[10/24 10:13:43 d2.utils.events]: eta: 0:14:26  iter: 479  total_loss: 1.900  loss_cls: 1.409  loss_box_reg: 0.485  time: 0.5750  data_time: 0.0149  lr: 0.000048  max_mem: 10458M
[10/24 10:13:54 d2.utils.events]: eta: 0:14:15  iter: 499  total_loss: 1.906  loss_cls: 1.423  loss_box_reg: 0.482  time: 0.5749  data_time: 0.0067  lr: 0.000050  max_mem: 10458M
[10/24 10:14:06 d2.utils.events]: eta: 0:14:04  iter: 519  total_loss: 1.886  loss_cls: 1.405  loss_box_reg: 0.483  time: 0.5751  data_time: 0.0071  lr: 0.000052  max_mem: 10458M
[10/24 10:14:18 d2.utils.events]: eta: 0:13:52  iter: 539  total_loss: 1.855  loss_cls: 1.369  loss_box_reg: 0.480  time: 0.5752  data_time: 0.0070  lr: 0.000054  max_mem: 10458M
[10/24 10:14:29 d2.utils.events]: eta: 0:13:41  iter: 559  total_loss: 1.888  loss_cls: 1.351  loss_box_reg: 0.483  time: 0.5755  data_time: 0.0168  lr: 0.000056  max_mem: 10458M
[10/24 10:14:41 d2.utils.events]: eta: 0:13:30  iter: 579  total_loss: 1.895  loss_cls: 1.415  loss_box_reg: 0.473  time: 0.5755  data_time: 0.0071  lr: 0.000058  max_mem: 10458M
[10/24 10:14:52 d2.utils.events]: eta: 0:13:19  iter: 599  total_loss: 1.913  loss_cls: 1.411  loss_box_reg: 0.487  time: 0.5756  data_time: 0.0073  lr: 0.000060  max_mem: 10458M
[10/24 10:15:04 d2.utils.events]: eta: 0:13:08  iter: 619  total_loss: 1.899  loss_cls: 1.422  loss_box_reg: 0.485  time: 0.5756  data_time: 0.0068  lr: 0.000062  max_mem: 10458M
[10/24 10:15:15 d2.utils.events]: eta: 0:12:56  iter: 639  total_loss: 1.911  loss_cls: 1.440  loss_box_reg: 0.475  time: 0.5757  data_time: 0.0068  lr: 0.000064  max_mem: 10458M
[10/24 10:15:27 d2.utils.events]: eta: 0:12:45  iter: 659  total_loss: 1.890  loss_cls: 1.413  loss_box_reg: 0.469  time: 0.5756  data_time: 0.0067  lr: 0.000066  max_mem: 10458M
[10/24 10:15:39 d2.utils.events]: eta: 0:12:34  iter: 679  total_loss: 1.928  loss_cls: 1.432  loss_box_reg: 0.478  time: 0.5759  data_time: 0.0071  lr: 0.000068  max_mem: 10458M
[10/24 10:15:50 d2.utils.events]: eta: 0:12:22  iter: 699  total_loss: 1.893  loss_cls: 1.415  loss_box_reg: 0.476  time: 0.5760  data_time: 0.0070  lr: 0.000070  max_mem: 10458M
[10/24 10:16:02 d2.utils.events]: eta: 0:12:11  iter: 719  total_loss: 1.861  loss_cls: 1.403  loss_box_reg: 0.461  time: 0.5760  data_time: 0.0072  lr: 0.000072  max_mem: 10458M
[10/24 10:16:14 d2.utils.events]: eta: 0:12:00  iter: 739  total_loss: 1.921  loss_cls: 1.435  loss_box_reg: 0.472  time: 0.5763  data_time: 0.0071  lr: 0.000074  max_mem: 10458M
[10/24 10:16:25 d2.utils.events]: eta: 0:11:48  iter: 759  total_loss: 1.896  loss_cls: 1.397  loss_box_reg: 0.466  time: 0.5764  data_time: 0.0074  lr: 0.000076  max_mem: 10458M
[10/24 10:16:37 d2.utils.events]: eta: 0:11:37  iter: 779  total_loss: 1.904  loss_cls: 1.462  loss_box_reg: 0.460  time: 0.5763  data_time: 0.0071  lr: 0.000078  max_mem: 10458M
[10/24 10:16:48 d2.utils.events]: eta: 0:11:26  iter: 799  total_loss: 1.847  loss_cls: 1.406  loss_box_reg: 0.467  time: 0.5764  data_time: 0.0072  lr: 0.000080  max_mem: 10458M
[10/24 10:17:00 d2.utils.events]: eta: 0:11:14  iter: 819  total_loss: 1.859  loss_cls: 1.404  loss_box_reg: 0.463  time: 0.5766  data_time: 0.0064  lr: 0.000082  max_mem: 10458M
[10/24 10:17:12 d2.utils.events]: eta: 0:11:03  iter: 839  total_loss: 1.850  loss_cls: 1.400  loss_box_reg: 0.455  time: 0.5768  data_time: 0.0068  lr: 0.000084  max_mem: 10458M
[10/24 10:17:23 d2.utils.events]: eta: 0:10:51  iter: 859  total_loss: 1.881  loss_cls: 1.419  loss_box_reg: 0.458  time: 0.5767  data_time: 0.0067  lr: 0.000086  max_mem: 10458M
[10/24 10:17:35 d2.utils.events]: eta: 0:10:40  iter: 879  total_loss: 1.885  loss_cls: 1.439  loss_box_reg: 0.455  time: 0.5767  data_time: 0.0085  lr: 0.000088  max_mem: 10458M
[10/24 10:17:46 d2.utils.events]: eta: 0:10:29  iter: 899  total_loss: 1.907  loss_cls: 1.454  loss_box_reg: 0.456  time: 0.5769  data_time: 0.0067  lr: 0.000090  max_mem: 10458M
[10/24 10:17:58 d2.utils.events]: eta: 0:10:17  iter: 919  total_loss: 1.859  loss_cls: 1.437  loss_box_reg: 0.445  time: 0.5770  data_time: 0.0086  lr: 0.000092  max_mem: 10458M
[10/24 10:18:10 d2.utils.events]: eta: 0:10:06  iter: 939  total_loss: 1.906  loss_cls: 1.447  loss_box_reg: 0.443  time: 0.5771  data_time: 0.0067  lr: 0.000094  max_mem: 10458M
[10/24 10:18:21 d2.utils.events]: eta: 0:09:55  iter: 959  total_loss: 1.858  loss_cls: 1.403  loss_box_reg: 0.438  time: 0.5773  data_time: 0.0067  lr: 0.000096  max_mem: 10458M
[10/24 10:18:34 d2.utils.events]: eta: 0:09:43  iter: 979  total_loss: 1.894  loss_cls: 1.418  loss_box_reg: 0.454  time: 0.5783  data_time: 0.0068  lr: 0.000098  max_mem: 10458M
[10/24 10:18:45 d2.utils.events]: eta: 0:09:32  iter: 999  total_loss: 1.826  loss_cls: 1.381  loss_box_reg: 0.435  time: 0.5783  data_time: 0.0068  lr: 0.000100  max_mem: 10458M

And one more thing - modified version of trainer (seen in Jupyter Notebook) does not support multi GPU. I had a look at couple of issues where model diverges and loss becomes NaN after couple of hundred iterations, and as I found out - hyperparameters are really sensitive towards batch size and learning rate.

I’ve changed couple of lines in detectron2/engine/defaults.py to support DistributedDataParallel instantly, I’ll check how this approach works and write the results in comments 😃

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:5 (2 by maintainers)

github_iconTop GitHub Comments

3reactions
wangg12commented, Oct 24, 2019

The learing rate will increase during the warmup phase.

0reactions
ppwwyyxxcommented, Oct 24, 2019

Could you please elaborate on multi-gpu training using modified trainer in Jupyter Notebook?

I don’t quite understand what you would like to know

Read more comments on GitHub >

github_iconTop Results From Across the Web

How to Use Learning Rate Annealing with Neural Networks?
Changing the learning rate for your stochastic gradient descent optimization technique can improve performance while also cutting down on ...
Read more >
Learning-Rate Annealing Methods for Deep Neural Networks
The learning rate is increased linearly or non-linearly to a specific value in the first few epochs, and then shrinks to zero. ......
Read more >
Setting the learning rate of your neural network. - Jeremy Jordan
The most popular form of learning rate annealing is a step decay where the learning rate is reduced by some percentage after a...
Read more >
Difference between cycling learning rate and learning rate ...
Learning rate (LR) annealing is you start learning at high LR and decrease it when you get closer to the local minimum. The...
Read more >
A Newbie's Guide to Stochastic Gradient Descent With Restarts
The first technique is Stochastic Gradient Descent with Restarts (SGDR), a variant of learning rate annealing, which gradually decreases the learning rate ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found