Learning Rate is increasing instead of annealing
See original GitHub issue❓ Questions and Help
Hey! Amazing work, I’ve struggled couple of times with tuning previous version of detectron, this new write-up is working really well, and gives ability to tune with minimum amount of time and energy.
A question is: why the learning rate is increasing instead of annealing step-by-step? As shown below:
[10/24 10:09:06 d2.engine.train_loop]: Starting training from iteration 0
[10/24 10:09:18 d2.utils.events]: eta: 0:18:36 iter: 19 total_loss: 1.902 loss_cls: 1.429 loss_box_reg: 0.487 time: 0.5684 data_time: 0.0405 lr: 0.000002 max_mem: 10456M
[10/24 10:09:29 d2.utils.events]: eta: 0:18:28 iter: 39 total_loss: 1.930 loss_cls: 1.435 loss_box_reg: 0.491 time: 0.5692 data_time: 0.0068 lr: 0.000004 max_mem: 10456M
[10/24 10:09:41 d2.utils.events]: eta: 0:18:17 iter: 59 total_loss: 1.896 loss_cls: 1.399 loss_box_reg: 0.500 time: 0.5695 data_time: 0.0078 lr: 0.000006 max_mem: 10456M
[10/24 10:09:52 d2.utils.events]: eta: 0:18:06 iter: 79 total_loss: 1.918 loss_cls: 1.416 loss_box_reg: 0.494 time: 0.5697 data_time: 0.0060 lr: 0.000008 max_mem: 10456M
[10/24 10:10:04 d2.utils.events]: eta: 0:17:55 iter: 99 total_loss: 1.888 loss_cls: 1.392 loss_box_reg: 0.496 time: 0.5695 data_time: 0.0071 lr: 0.000010 max_mem: 10456M
[10/24 10:10:15 d2.utils.events]: eta: 0:17:44 iter: 119 total_loss: 1.876 loss_cls: 1.382 loss_box_reg: 0.493 time: 0.5699 data_time: 0.0072 lr: 0.000012 max_mem: 10456M
[10/24 10:10:27 d2.utils.events]: eta: 0:17:34 iter: 139 total_loss: 1.894 loss_cls: 1.398 loss_box_reg: 0.485 time: 0.5705 data_time: 0.0064 lr: 0.000014 max_mem: 10456M
[10/24 10:10:38 d2.utils.events]: eta: 0:17:23 iter: 159 total_loss: 1.939 loss_cls: 1.448 loss_box_reg: 0.488 time: 0.5702 data_time: 0.0068 lr: 0.000016 max_mem: 10458M
[10/24 10:10:49 d2.utils.events]: eta: 0:17:12 iter: 179 total_loss: 1.910 loss_cls: 1.422 loss_box_reg: 0.502 time: 0.5701 data_time: 0.0068 lr: 0.000018 max_mem: 10458M
[10/24 10:11:01 d2.utils.events]: eta: 0:17:02 iter: 199 total_loss: 1.918 loss_cls: 1.435 loss_box_reg: 0.489 time: 0.5711 data_time: 0.0069 lr: 0.000020 max_mem: 10458M
[10/24 10:11:13 d2.utils.events]: eta: 0:16:51 iter: 219 total_loss: 1.922 loss_cls: 1.442 loss_box_reg: 0.478 time: 0.5716 data_time: 0.0069 lr: 0.000022 max_mem: 10458M
[10/24 10:11:24 d2.utils.events]: eta: 0:16:39 iter: 239 total_loss: 1.935 loss_cls: 1.441 loss_box_reg: 0.494 time: 0.5722 data_time: 0.0069 lr: 0.000024 max_mem: 10458M
[10/24 10:11:36 d2.utils.events]: eta: 0:16:28 iter: 259 total_loss: 1.924 loss_cls: 1.430 loss_box_reg: 0.495 time: 0.5722 data_time: 0.0106 lr: 0.000026 max_mem: 10458M
[10/24 10:11:47 d2.utils.events]: eta: 0:16:18 iter: 279 total_loss: 1.903 loss_cls: 1.400 loss_box_reg: 0.497 time: 0.5728 data_time: 0.0068 lr: 0.000028 max_mem: 10458M
[10/24 10:11:59 d2.utils.events]: eta: 0:16:07 iter: 299 total_loss: 1.929 loss_cls: 1.437 loss_box_reg: 0.486 time: 0.5733 data_time: 0.0068 lr: 0.000030 max_mem: 10458M
[10/24 10:12:10 d2.utils.events]: eta: 0:15:56 iter: 319 total_loss: 1.956 loss_cls: 1.467 loss_box_reg: 0.479 time: 0.5731 data_time: 0.0069 lr: 0.000032 max_mem: 10458M
[10/24 10:12:22 d2.utils.events]: eta: 0:15:44 iter: 339 total_loss: 1.910 loss_cls: 1.429 loss_box_reg: 0.491 time: 0.5736 data_time: 0.0068 lr: 0.000034 max_mem: 10458M
[10/24 10:12:33 d2.utils.events]: eta: 0:15:33 iter: 359 total_loss: 1.904 loss_cls: 1.409 loss_box_reg: 0.483 time: 0.5734 data_time: 0.0068 lr: 0.000036 max_mem: 10458M
[10/24 10:12:45 d2.utils.events]: eta: 0:15:22 iter: 379 total_loss: 1.951 loss_cls: 1.463 loss_box_reg: 0.488 time: 0.5735 data_time: 0.0067 lr: 0.000038 max_mem: 10458M
[10/24 10:12:56 d2.utils.events]: eta: 0:15:11 iter: 399 total_loss: 1.918 loss_cls: 1.423 loss_box_reg: 0.484 time: 0.5739 data_time: 0.0067 lr: 0.000040 max_mem: 10458M
[10/24 10:13:08 d2.utils.events]: eta: 0:15:00 iter: 419 total_loss: 1.881 loss_cls: 1.418 loss_box_reg: 0.490 time: 0.5743 data_time: 0.0067 lr: 0.000042 max_mem: 10458M
[10/24 10:13:20 d2.utils.events]: eta: 0:14:49 iter: 439 total_loss: 1.878 loss_cls: 1.404 loss_box_reg: 0.486 time: 0.5747 data_time: 0.0067 lr: 0.000044 max_mem: 10458M
[10/24 10:13:31 d2.utils.events]: eta: 0:14:37 iter: 459 total_loss: 1.890 loss_cls: 1.393 loss_box_reg: 0.489 time: 0.5749 data_time: 0.0069 lr: 0.000046 max_mem: 10458M
[10/24 10:13:43 d2.utils.events]: eta: 0:14:26 iter: 479 total_loss: 1.900 loss_cls: 1.409 loss_box_reg: 0.485 time: 0.5750 data_time: 0.0149 lr: 0.000048 max_mem: 10458M
[10/24 10:13:54 d2.utils.events]: eta: 0:14:15 iter: 499 total_loss: 1.906 loss_cls: 1.423 loss_box_reg: 0.482 time: 0.5749 data_time: 0.0067 lr: 0.000050 max_mem: 10458M
[10/24 10:14:06 d2.utils.events]: eta: 0:14:04 iter: 519 total_loss: 1.886 loss_cls: 1.405 loss_box_reg: 0.483 time: 0.5751 data_time: 0.0071 lr: 0.000052 max_mem: 10458M
[10/24 10:14:18 d2.utils.events]: eta: 0:13:52 iter: 539 total_loss: 1.855 loss_cls: 1.369 loss_box_reg: 0.480 time: 0.5752 data_time: 0.0070 lr: 0.000054 max_mem: 10458M
[10/24 10:14:29 d2.utils.events]: eta: 0:13:41 iter: 559 total_loss: 1.888 loss_cls: 1.351 loss_box_reg: 0.483 time: 0.5755 data_time: 0.0168 lr: 0.000056 max_mem: 10458M
[10/24 10:14:41 d2.utils.events]: eta: 0:13:30 iter: 579 total_loss: 1.895 loss_cls: 1.415 loss_box_reg: 0.473 time: 0.5755 data_time: 0.0071 lr: 0.000058 max_mem: 10458M
[10/24 10:14:52 d2.utils.events]: eta: 0:13:19 iter: 599 total_loss: 1.913 loss_cls: 1.411 loss_box_reg: 0.487 time: 0.5756 data_time: 0.0073 lr: 0.000060 max_mem: 10458M
[10/24 10:15:04 d2.utils.events]: eta: 0:13:08 iter: 619 total_loss: 1.899 loss_cls: 1.422 loss_box_reg: 0.485 time: 0.5756 data_time: 0.0068 lr: 0.000062 max_mem: 10458M
[10/24 10:15:15 d2.utils.events]: eta: 0:12:56 iter: 639 total_loss: 1.911 loss_cls: 1.440 loss_box_reg: 0.475 time: 0.5757 data_time: 0.0068 lr: 0.000064 max_mem: 10458M
[10/24 10:15:27 d2.utils.events]: eta: 0:12:45 iter: 659 total_loss: 1.890 loss_cls: 1.413 loss_box_reg: 0.469 time: 0.5756 data_time: 0.0067 lr: 0.000066 max_mem: 10458M
[10/24 10:15:39 d2.utils.events]: eta: 0:12:34 iter: 679 total_loss: 1.928 loss_cls: 1.432 loss_box_reg: 0.478 time: 0.5759 data_time: 0.0071 lr: 0.000068 max_mem: 10458M
[10/24 10:15:50 d2.utils.events]: eta: 0:12:22 iter: 699 total_loss: 1.893 loss_cls: 1.415 loss_box_reg: 0.476 time: 0.5760 data_time: 0.0070 lr: 0.000070 max_mem: 10458M
[10/24 10:16:02 d2.utils.events]: eta: 0:12:11 iter: 719 total_loss: 1.861 loss_cls: 1.403 loss_box_reg: 0.461 time: 0.5760 data_time: 0.0072 lr: 0.000072 max_mem: 10458M
[10/24 10:16:14 d2.utils.events]: eta: 0:12:00 iter: 739 total_loss: 1.921 loss_cls: 1.435 loss_box_reg: 0.472 time: 0.5763 data_time: 0.0071 lr: 0.000074 max_mem: 10458M
[10/24 10:16:25 d2.utils.events]: eta: 0:11:48 iter: 759 total_loss: 1.896 loss_cls: 1.397 loss_box_reg: 0.466 time: 0.5764 data_time: 0.0074 lr: 0.000076 max_mem: 10458M
[10/24 10:16:37 d2.utils.events]: eta: 0:11:37 iter: 779 total_loss: 1.904 loss_cls: 1.462 loss_box_reg: 0.460 time: 0.5763 data_time: 0.0071 lr: 0.000078 max_mem: 10458M
[10/24 10:16:48 d2.utils.events]: eta: 0:11:26 iter: 799 total_loss: 1.847 loss_cls: 1.406 loss_box_reg: 0.467 time: 0.5764 data_time: 0.0072 lr: 0.000080 max_mem: 10458M
[10/24 10:17:00 d2.utils.events]: eta: 0:11:14 iter: 819 total_loss: 1.859 loss_cls: 1.404 loss_box_reg: 0.463 time: 0.5766 data_time: 0.0064 lr: 0.000082 max_mem: 10458M
[10/24 10:17:12 d2.utils.events]: eta: 0:11:03 iter: 839 total_loss: 1.850 loss_cls: 1.400 loss_box_reg: 0.455 time: 0.5768 data_time: 0.0068 lr: 0.000084 max_mem: 10458M
[10/24 10:17:23 d2.utils.events]: eta: 0:10:51 iter: 859 total_loss: 1.881 loss_cls: 1.419 loss_box_reg: 0.458 time: 0.5767 data_time: 0.0067 lr: 0.000086 max_mem: 10458M
[10/24 10:17:35 d2.utils.events]: eta: 0:10:40 iter: 879 total_loss: 1.885 loss_cls: 1.439 loss_box_reg: 0.455 time: 0.5767 data_time: 0.0085 lr: 0.000088 max_mem: 10458M
[10/24 10:17:46 d2.utils.events]: eta: 0:10:29 iter: 899 total_loss: 1.907 loss_cls: 1.454 loss_box_reg: 0.456 time: 0.5769 data_time: 0.0067 lr: 0.000090 max_mem: 10458M
[10/24 10:17:58 d2.utils.events]: eta: 0:10:17 iter: 919 total_loss: 1.859 loss_cls: 1.437 loss_box_reg: 0.445 time: 0.5770 data_time: 0.0086 lr: 0.000092 max_mem: 10458M
[10/24 10:18:10 d2.utils.events]: eta: 0:10:06 iter: 939 total_loss: 1.906 loss_cls: 1.447 loss_box_reg: 0.443 time: 0.5771 data_time: 0.0067 lr: 0.000094 max_mem: 10458M
[10/24 10:18:21 d2.utils.events]: eta: 0:09:55 iter: 959 total_loss: 1.858 loss_cls: 1.403 loss_box_reg: 0.438 time: 0.5773 data_time: 0.0067 lr: 0.000096 max_mem: 10458M
[10/24 10:18:34 d2.utils.events]: eta: 0:09:43 iter: 979 total_loss: 1.894 loss_cls: 1.418 loss_box_reg: 0.454 time: 0.5783 data_time: 0.0068 lr: 0.000098 max_mem: 10458M
[10/24 10:18:45 d2.utils.events]: eta: 0:09:32 iter: 999 total_loss: 1.826 loss_cls: 1.381 loss_box_reg: 0.435 time: 0.5783 data_time: 0.0068 lr: 0.000100 max_mem: 10458M
And one more thing - modified version of trainer (seen in Jupyter Notebook) does not support multi GPU. I had a look at couple of issues where model diverges and loss becomes NaN after couple of hundred iterations, and as I found out - hyperparameters are really sensitive towards batch size and learning rate.
I’ve changed couple of lines in detectron2/engine/defaults.py to support DistributedDataParallel instantly, I’ll check how this approach works and write the results in comments 😃
Issue Analytics
- State:
- Created 4 years ago
- Comments:5 (2 by maintainers)
Top GitHub Comments
The learing rate will increase during the warmup phase.
I don’t quite understand what you would like to know