question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Increasing bacth time when training e2e_mask_rcnn_R_50_FPN_1x

See original GitHub issue

❓ Questions and Help

Hi @fmassa, thanks for your implementation. When I re-train e2e_mask_rcnn_R_50_FPN_1x model using the same configuration as given by default, I noticed that the time consumption of each iteration increases gradually at the beginning of the training, so I logged the time consumption of each step during an iteration. As the following log shows, the time consumption of both forward and backward for an iteration increases as the training goes on, so does the “eta”.

Another confusing issue is that the gpu memory consumption also increases at the beginning of the training. Although it stables later, the value (3273 in my case) doesn’t match the memory consumption given by nvidia-smi .

Thanks for your attention! 😃

My environment information:

PyTorch version: 1.1.0
Is debug build: No
CUDA used to build PyTorch: 9.0.176

OS: Linux Server 7.2
GCC version: (GCC) 4.9.2
CMake version: version 3.8.2

Python version: 3.7
Is CUDA available: Yes
CUDA runtime version: 9.0.176
GPU models and configuration: 
GPU 0: Tesla P100-PCIE-16GB
GPU 1: Tesla P100-PCIE-16GB

Nvidia driver version: 384.145
cuDNN version: libcudnn.so.7.6.0

Versions of relevant libraries:
[pip] numpy==1.16.4
[pip] numpydoc==0.9.1
[pip] torch==1.1.0
[pip] torchvision==0.2.2
[conda] blas                      1.0                         mkl  
[conda] mkl                       2019.4                      243  
[conda] mkl-include               2019.4                      243  
[conda] mkl-service               2.0.2            py37h7b6447c_0  
[conda] mkl_fft                   1.0.12           py37ha843d7b_0  
[conda] mkl_random                1.0.2            py37hd81dba3_0  
[conda] torch                     1.1.0                    pypi_0    pypi
        Pillow (6.1.0)

The training log of e2e_mask_rcnn_R_50_FPN_1x:

INFO: Start training
INFO: eta: 12:20:53  iter: 20  loss: 1.8344 (2.1594)  loss_box_reg: 0.0809 (0.0799)  loss_classifier: 0.4756 (0.7001)  loss_mask: 0.7108 (0.7706)  loss_objectness: 0.3915 (0.4632)  loss_rpn_box_reg: 0.0932 (0.1456)  time: 0.4128 (0.4940)  data: 0.0124 (0.0459)  cpu2cuda: 0.1303 (0.1154)  forward: 0.1933 (0.2479)  backward: 0.0574 (0.0698)  lr: 0.007200  max mem: 2957
INFO: eta: 11:20:44  iter: 40  loss: 1.4896 (1.8123)  loss_box_reg: 0.0897 (0.0842)  loss_classifier: 0.4175 (0.5472)  loss_mask: 0.6901 (0.7303)  loss_objectness: 0.2093 (0.3408)  loss_rpn_box_reg: 0.0544 (0.1098)  time: 0.3988 (0.4540)  data: 0.0095 (0.0287)  cpu2cuda: 0.1340 (0.1209)  forward: 0.1884 (0.2195)  backward: 0.0594 (0.0687)  lr: 0.007733  max mem: 2957
INFO: eta: 11:08:29  iter: 60  loss: 1.4800 (1.7611)  loss_box_reg: 0.0832 (0.0933)  loss_classifier: 0.3759 (0.5120)  loss_mask: 0.6892 (0.7161)  loss_objectness: 0.2313 (0.3252)  loss_rpn_box_reg: 0.0897 (0.1146)  time: 0.4268 (0.4460)  data: 0.0146 (0.0246)  cpu2cuda: 0.1180 (0.1180)  forward: 0.1988 (0.2151)  backward: 0.0683 (0.0728)  lr: 0.008267  max mem: 2957
INFO: eta: 10:51:55  iter: 80  loss: 1.2965 (1.6597)  loss_box_reg: 0.0588 (0.0884)  loss_classifier: 0.2801 (0.4598)  loss_mask: 0.6854 (0.7076)  loss_objectness: 0.1827 (0.2940)  loss_rpn_box_reg: 0.0745 (0.1099)  time: 0.3816 (0.4350)  data: 0.0092 (0.0210)  cpu2cuda: 0.1328 (0.1212)  forward: 0.1893 (0.2088)  backward: 0.0525 (0.0694)  lr: 0.008800  max mem: 2957
INFO: eta: 10:42:00  iter: 100  loss: 1.3173 (1.5981)  loss_box_reg: 0.0571 (0.0844)  loss_classifier: 0.2880 (0.4290)  loss_mask: 0.6679 (0.6994)  loss_objectness: 0.1850 (0.2749)  loss_rpn_box_reg: 0.1171 (0.1105)  time: 0.4012 (0.4285)  data: 0.0097 (0.0189)  cpu2cuda: 0.1339 (0.1242)  forward: 0.1873 (0.2047)  backward: 0.0537 (0.0665)  lr: 0.009333  max mem: 2957
INFO: eta: 10:39:14  iter: 120  loss: 1.3781 (1.5678)  loss_box_reg: 0.0747 (0.0837)  loss_classifier: 0.2876 (0.4133)  loss_mask: 0.6767 (0.6946)  loss_objectness: 0.2126 (0.2666)  loss_rpn_box_reg: 0.1049 (0.1095)  time: 0.4035 (0.4267)  data: 0.0107 (0.0177)  cpu2cuda: 0.1307 (0.1258)  forward: 0.1943 (0.2034)  backward: 0.0552 (0.0657)  lr: 0.009867  max mem: 2957
INFO: eta: 10:37:59  iter: 140  loss: 1.4151 (1.5545)  loss_box_reg: 0.0892 (0.0864)  loss_classifier: 0.3723 (0.4117)  loss_mask: 0.6731 (0.6920)  loss_objectness: 0.1878 (0.2565)  loss_rpn_box_reg: 0.0732 (0.1079)  time: 0.4150 (0.4260)  data: 0.0112 (0.0169)  cpu2cuda: 0.1350 (0.1266)  forward: 0.1986 (0.2031)  backward: 0.0583 (0.0654)  lr: 0.010400  max mem: 2957
INFO: eta: 10:36:36  iter: 160  loss: 1.3080 (1.5355)  loss_box_reg: 0.0709 (0.0850)  loss_classifier: 0.3262 (0.4041)  loss_mask: 0.6504 (0.6872)  loss_objectness: 0.1965 (0.2505)  loss_rpn_box_reg: 0.0886 (0.1087)  time: 0.4124 (0.4252)  data: 0.0097 (0.0167)  cpu2cuda: 0.1293 (0.1265)  forward: 0.1983 (0.2030)  backward: 0.0583 (0.0656)  lr: 0.010933  max mem: 2957
INFO: eta: 10:40:31  iter: 180  loss: 1.4487 (1.5269)  loss_box_reg: 0.1099 (0.0890)  loss_classifier: 0.3957 (0.4051)  loss_mask: 0.6496 (0.6833)  loss_objectness: 0.1350 (0.2418)  loss_rpn_box_reg: 0.0967 (0.1076)  time: 0.4232 (0.4279)  data: 0.0115 (0.0163)  cpu2cuda: 0.1287 (0.1259)  forward: 0.1917 (0.2033)  backward: 0.0645 (0.0672)  lr: 0.011467  max mem: 2957
INFO: eta: 10:39:20  iter: 200  loss: 1.3171 (1.5171)  loss_box_reg: 0.1233 (0.0926)  loss_classifier: 0.3784 (0.4066)  loss_mask: 0.6538 (0.6797)  loss_objectness: 0.1193 (0.2324)  loss_rpn_box_reg: 0.0555 (0.1059)  time: 0.4112 (0.4272)  data: 0.0131 (0.0161)  cpu2cuda: 0.1209 (0.1253)  forward: 0.1942 (0.2039)  backward: 0.0596 (0.0675)  lr: 0.012000  max mem: 2957
INFO: eta: 10:42:17  iter: 220  loss: 1.4505 (1.5276)  loss_box_reg: 0.1381 (0.0981)  loss_classifier: 0.4531 (0.4163)  loss_mask: 0.6570 (0.6773)  loss_objectness: 0.1509 (0.2292)  loss_rpn_box_reg: 0.0731 (0.1068)  time: 0.4373 (0.4292)  data: 0.0137 (0.0163)  cpu2cuda: 0.1250 (0.1241)  forward: 0.2011 (0.2056)  backward: 0.0675 (0.0692)  lr: 0.012533  max mem: 2957
INFO: eta: 10:43:42  iter: 240  loss: 1.4663 (1.5232)  loss_box_reg: 0.1024 (0.0994)  loss_classifier: 0.4020 (0.4155)  loss_mask: 0.6384 (0.6745)  loss_objectness: 0.1889 (0.2267)  loss_rpn_box_reg: 0.0918 (0.1071)  time: 0.4278 (0.4303)  data: 0.0121 (0.0161)  cpu2cuda: 0.1228 (0.1239)  forward: 0.2125 (0.2065)  backward: 0.0665 (0.0701)  lr: 0.013067  max mem: 2957
INFO: eta: 10:46:49  iter: 260  loss: 1.3772 (1.5198)  loss_box_reg: 0.1254 (0.1028)  loss_classifier: 0.4421 (0.4177)  loss_mask: 0.6168 (0.6702)  loss_objectness: 0.1534 (0.2222)  loss_rpn_box_reg: 0.0947 (0.1069)  time: 0.4475 (0.4325)  data: 0.0181 (0.0163)  cpu2cuda: 0.1118 (0.1222)  forward: 0.2209 (0.2092)  backward: 0.0759 (0.0718)  lr: 0.013600  max mem: 3131
INFO: eta: 10:48:30  iter: 280  loss: 1.4391 (1.5191)  loss_box_reg: 0.1371 (0.1066)  loss_classifier: 0.4118 (0.4199)  loss_mask: 0.5962 (0.6656)  loss_objectness: 0.1755 (0.2190)  loss_rpn_box_reg: 0.0967 (0.1080)  time: 0.4383 (0.4337)  data: 0.0171 (0.0167)  cpu2cuda: 0.1211 (0.1215)  forward: 0.2095 (0.2101)  backward: 0.0667 (0.0723)  lr: 0.014133  max mem: 3131
INFO: eta: 10:51:17  iter: 300  loss: 1.4413 (1.5184)  loss_box_reg: 0.1337 (0.1093)  loss_classifier: 0.4201 (0.4218)  loss_mask: 0.6180 (0.6621)  loss_objectness: 0.1562 (0.2178)  loss_rpn_box_reg: 0.0780 (0.1074)  time: 0.4521 (0.4356)  data: 0.0161 (0.0170)  cpu2cuda: 0.1166 (0.1207)  forward: 0.2226 (0.2109)  backward: 0.0748 (0.0734)  lr: 0.014667  max mem: 3131
INFO: eta: 10:53:10  iter: 320  loss: 1.4700 (1.5158)  loss_box_reg: 0.1329 (0.1118)  loss_classifier: 0.4295 (0.4238)  loss_mask: 0.5635 (0.6574)  loss_objectness: 0.1364 (0.2145)  loss_rpn_box_reg: 0.0643 (0.1083)  time: 0.4628 (0.4370)  data: 0.0129 (0.0168)  cpu2cuda: 0.1140 (0.1196)  forward: 0.2092 (0.2118)  backward: 0.0961 (0.0753)  lr: 0.015200  max mem: 3131
INFO: eta: 10:55:10  iter: 340  loss: 1.5558 (1.5217)  loss_box_reg: 0.1513 (0.1148)  loss_classifier: 0.4643 (0.4279)  loss_mask: 0.6037 (0.6541)  loss_objectness: 0.1849 (0.2147)  loss_rpn_box_reg: 0.0938 (0.1102)  time: 0.4561 (0.4384)  data: 0.0174 (0.0171)  cpu2cuda: 0.1145 (0.1189)  forward: 0.2164 (0.2120)  backward: 0.0767 (0.0760)  lr: 0.015733  max mem: 3131
INFO: eta: 10:56:08  iter: 360  loss: 1.3707 (1.5178)  loss_box_reg: 0.1371 (0.1168)  loss_classifier: 0.4722 (0.4301)  loss_mask: 0.5698 (0.6502)  loss_objectness: 0.1456 (0.2115)  loss_rpn_box_reg: 0.0804 (0.1094)  time: 0.4343 (0.4392)  data: 0.0132 (0.0170)  cpu2cuda: 0.1288 (0.1192)  forward: 0.2081 (0.2118)  backward: 0.0665 (0.0758)  lr: 0.016267  max mem: 3131
INFO: eta: 11:00:07  iter: 380  loss: 1.6307 (1.5196)  loss_box_reg: 0.1565 (0.1186)  loss_classifier: 0.4888 (0.4319)  loss_mask: 0.6033 (0.6476)  loss_objectness: 0.1703 (0.2110)  loss_rpn_box_reg: 0.1381 (0.1105)  time: 0.4972 (0.4419)  data: 0.0153 (0.0171)  cpu2cuda: 0.1180 (0.1189)  forward: 0.2079 (0.2127)  backward: 0.0702 (0.0768)  lr: 0.016800  max mem: 3131
INFO: eta: 11:02:06  iter: 400  loss: 1.3411 (1.5151)  loss_box_reg: 0.1450 (0.1201)  loss_classifier: 0.4000 (0.4316)  loss_mask: 0.5971 (0.6449)  loss_objectness: 0.1077 (0.2084)  loss_rpn_box_reg: 0.0732 (0.1101)  time: 0.4465 (0.4434)  data: 0.0148 (0.0171)  cpu2cuda: 0.0952 (0.1182)  forward: 0.2020 (0.2125)  backward: 0.0933 (0.0782)  lr: 0.017333  max mem: 3131
INFO: eta: 11:01:45  iter: 420  loss: 1.4029 (1.5127)  loss_box_reg: 0.1201 (0.1208)  loss_classifier: 0.3815 (0.4302)  loss_mask: 0.6024 (0.6433)  loss_objectness: 0.1645 (0.2076)  loss_rpn_box_reg: 0.1111 (0.1109)  time: 0.4344 (0.4432)  data: 0.0135 (0.0171)  cpu2cuda: 0.1189 (0.1180)  forward: 0.2010 (0.2124)  backward: 0.0656 (0.0781)  lr: 0.017867  max mem: 3131
INFO: eta: 11:03:44  iter: 440  loss: 1.3946 (1.5125)  loss_box_reg: 0.1232 (0.1219)  loss_classifier: 0.4124 (0.4324)  loss_mask: 0.5625 (0.6399)  loss_objectness: 0.1774 (0.2062)  loss_rpn_box_reg: 0.1292 (0.1122)  time: 0.4359 (0.4447)  data: 0.0158 (0.0173)  cpu2cuda: 0.1210 (0.1180)  forward: 0.2124 (0.2127)  backward: 0.0695 (0.0784)  lr: 0.018400  max mem: 3131
INFO: eta: 11:04:50  iter: 460  loss: 1.5121 (1.5134)  loss_box_reg: 0.1594 (0.1242)  loss_classifier: 0.4827 (0.4355)  loss_mask: 0.5583 (0.6362)  loss_objectness: 0.1660 (0.2046)  loss_rpn_box_reg: 0.1026 (0.1130)  time: 0.4418 (0.4455)  data: 0.0170 (0.0175)  cpu2cuda: 0.1101 (0.1176)  forward: 0.2101 (0.2133)  backward: 0.0697 (0.0788)  lr: 0.018933  max mem: 3131
INFO: eta: 11:04:47  iter: 480  loss: 1.3543 (1.5083)  loss_box_reg: 0.1279 (0.1244)  loss_classifier: 0.3937 (0.4354)  loss_mask: 0.5632 (0.6330)  loss_objectness: 0.1731 (0.2029)  loss_rpn_box_reg: 0.0806 (0.1126)  time: 0.4420 (0.4456)  data: 0.0139 (0.0174)  cpu2cuda: 0.1218 (0.1179)  forward: 0.2139 (0.2137)  backward: 0.0682 (0.0786)  lr: 0.019467  max mem: 3131
INFO: eta: 11:07:36  iter: 500  loss: 1.5172 (1.5122)  loss_box_reg: 0.1848 (0.1269)  loss_classifier: 0.5513 (0.4405)  loss_mask: 0.5547 (0.6303)  loss_objectness: 0.1759 (0.2021)  loss_rpn_box_reg: 0.0831 (0.1124)  time: 0.4883 (0.4476)  data: 0.0177 (0.0175)  cpu2cuda: 0.1091 (0.1173)  forward: 0.2139 (0.2144)  backward: 0.0857 (0.0797)  lr: 0.020000  max mem: 3131
INFO: eta: 11:09:31  iter: 520  loss: 1.4537 (1.5131)  loss_box_reg: 0.1538 (0.1294)  loss_classifier: 0.5222 (0.4445)  loss_mask: 0.5596 (0.6275)  loss_objectness: 0.1241 (0.1995)  loss_rpn_box_reg: 0.0683 (0.1123)  time: 0.5054 (0.4489)  data: 0.0171 (0.0175)  cpu2cuda: 0.0947 (0.1164)  forward: 0.2117 (0.2151)  backward: 0.0931 (0.0809)  lr: 0.020000  max mem: 3131
INFO: eta: 11:08:28  iter: 540  loss: 1.3338 (1.5086)  loss_box_reg: 0.1210 (0.1294)  loss_classifier: 0.3701 (0.4437)  loss_mask: 0.5264 (0.6241)  loss_objectness: 0.1558 (0.1987)  loss_rpn_box_reg: 0.0874 (0.1127)  time: 0.4245 (0.4483)  data: 0.0146 (0.0176)  cpu2cuda: 0.1203 (0.1165)  forward: 0.2159 (0.2151)  backward: 0.0705 (0.0805)  lr: 0.020000  max mem: 3131
INFO: eta: 11:09:55  iter: 560  loss: 1.3118 (1.5056)  loss_box_reg: 0.1342 (0.1303)  loss_classifier: 0.4353 (0.4435)  loss_mask: 0.5320 (0.6217)  loss_objectness: 0.1433 (0.1975)  loss_rpn_box_reg: 0.0967 (0.1126)  time: 0.4589 (0.4494)  data: 0.0180 (0.0177)  cpu2cuda: 0.1188 (0.1164)  forward: 0.2182 (0.2161)  backward: 0.0775 (0.0806)  lr: 0.020000  max mem: 3131
INFO: eta: 11:10:10  iter: 580  loss: 1.3408 (1.5029)  loss_box_reg: 0.1251 (0.1304)  loss_classifier: 0.4113 (0.4428)  loss_mask: 0.5330 (0.6189)  loss_objectness: 0.1415 (0.1982)  loss_rpn_box_reg: 0.0856 (0.1125)  time: 0.4313 (0.4497)  data: 0.0145 (0.0177)  cpu2cuda: 0.1261 (0.1165)  forward: 0.2007 (0.2162)  backward: 0.0685 (0.0804)  lr: 0.020000  max mem: 3131
INFO: eta: 11:08:44  iter: 600  loss: 1.2969 (1.4975)  loss_box_reg: 0.1264 (0.1297)  loss_classifier: 0.3952 (0.4405)  loss_mask: 0.5154 (0.6155)  loss_objectness: 0.2217 (0.1985)  loss_rpn_box_reg: 0.1029 (0.1133)  time: 0.4169 (0.4488)  data: 0.0117 (0.0175)  cpu2cuda: 0.1247 (0.1167)  forward: 0.1943 (0.2159)  backward: 0.0645 (0.0802)  lr: 0.020000  max mem: 3150
INFO: eta: 11:10:06  iter: 620  loss: 1.5374 (1.4976)  loss_box_reg: 0.1542 (0.1308)  loss_classifier: 0.4586 (0.4420)  loss_mask: 0.5166 (0.6127)  loss_objectness: 0.1697 (0.1982)  loss_rpn_box_reg: 0.1198 (0.1140)  time: 0.4781 (0.4498)  data: 0.0155 (0.0175)  cpu2cuda: 0.1057 (0.1163)  forward: 0.2199 (0.2164)  backward: 0.0900 (0.0810)  lr: 0.020000  max mem: 3150
INFO: eta: 11:11:00  iter: 640  loss: 1.3474 (1.4970)  loss_box_reg: 0.1557 (0.1321)  loss_classifier: 0.4703 (0.4449)  loss_mask: 0.5015 (0.6098)  loss_objectness: 0.1281 (0.1964)  loss_rpn_box_reg: 0.0733 (0.1137)  time: 0.4888 (0.4505)  data: 0.0179 (0.0176)  cpu2cuda: 0.1139 (0.1160)  forward: 0.2094 (0.2168)  backward: 0.0787 (0.0813)  lr: 0.020000  max mem: 3150
INFO: eta: 11:11:38  iter: 660  loss: 1.3212 (1.4941)  loss_box_reg: 0.1433 (0.1330)  loss_classifier: 0.4361 (0.4460)  loss_mask: 0.5075 (0.6069)  loss_objectness: 0.1383 (0.1949)  loss_rpn_box_reg: 0.0770 (0.1132)  time: 0.4615 (0.4511)  data: 0.0171 (0.0177)  cpu2cuda: 0.1180 (0.1158)  forward: 0.2084 (0.2168)  backward: 0.0792 (0.0816)  lr: 0.020000  max mem: 3150
INFO: eta: 11:12:27  iter: 680  loss: 1.5639 (1.4951)  loss_box_reg: 0.1397 (0.1339)  loss_classifier: 0.4790 (0.4468)  loss_mask: 0.5087 (0.6038)  loss_objectness: 0.2357 (0.1966)  loss_rpn_box_reg: 0.1220 (0.1140)  time: 0.4541 (0.4517)  data: 0.0121 (0.0176)  cpu2cuda: 0.1308 (0.1160)  forward: 0.2091 (0.2168)  backward: 0.0692 (0.0816)  lr: 0.020000  max mem: 3150
INFO: eta: 11:12:04  iter: 700  loss: 1.4248 (1.4917)  loss_box_reg: 0.1590 (0.1342)  loss_classifier: 0.4564 (0.4470)  loss_mask: 0.4977 (0.6008)  loss_objectness: 0.1468 (0.1956)  loss_rpn_box_reg: 0.0818 (0.1142)  time: 0.4293 (0.4516)  data: 0.0136 (0.0176)  cpu2cuda: 0.1206 (0.1160)  forward: 0.2051 (0.2168)  backward: 0.0674 (0.0816)  lr: 0.020000  max mem: 3150
INFO: eta: 11:13:17  iter: 720  loss: 1.5100 (1.4919)  loss_box_reg: 0.1465 (0.1352)  loss_classifier: 0.4196 (0.4478)  loss_mask: 0.5119 (0.5985)  loss_objectness: 0.1686 (0.1950)  loss_rpn_box_reg: 0.1084 (0.1153)  time: 0.4487 (0.4525)  data: 0.0218 (0.0177)  cpu2cuda: 0.1146 (0.1159)  forward: 0.2206 (0.2172)  backward: 0.0822 (0.0817)  lr: 0.020000  max mem: 3150
INFO: eta: 11:14:15  iter: 740  loss: 1.4570 (1.4908)  loss_box_reg: 0.1518 (0.1361)  loss_classifier: 0.4781 (0.4488)  loss_mask: 0.5070 (0.5961)  loss_objectness: 0.1752 (0.1942)  loss_rpn_box_reg: 0.1007 (0.1155)  time: 0.4968 (0.4532)  data: 0.0163 (0.0178)  cpu2cuda: 0.1122 (0.1157)  forward: 0.2176 (0.2178)  backward: 0.0864 (0.0820)  lr: 0.020000  max mem: 3150
INFO: eta: 11:15:05  iter: 760  loss: 1.4302 (1.4901)  loss_box_reg: 0.1487 (0.1369)  loss_classifier: 0.4933 (0.4504)  loss_mask: 0.5077 (0.5939)  loss_objectness: 0.1690 (0.1937)  loss_rpn_box_reg: 0.0551 (0.1152)  time: 0.4706 (0.4539)  data: 0.0143 (0.0177)  cpu2cuda: 0.1006 (0.1154)  forward: 0.2128 (0.2179)  backward: 0.0852 (0.0826)  lr: 0.020000  max mem: 3150
INFO: eta: 11:16:18  iter: 780  loss: 1.3618 (1.4883)  loss_box_reg: 0.1646 (0.1377)  loss_classifier: 0.4730 (0.4513)  loss_mask: 0.5036 (0.5916)  loss_objectness: 0.1263 (0.1930)  loss_rpn_box_reg: 0.0764 (0.1147)  time: 0.4732 (0.4548)  data: 0.0148 (0.0177)  cpu2cuda: 0.1088 (0.1153)  forward: 0.2292 (0.2182)  backward: 0.0815 (0.0830)  lr: 0.020000  max mem: 3150
INFO: eta: 11:16:35  iter: 800  loss: 1.3844 (1.4862)  loss_box_reg: 0.1618 (0.1386)  loss_classifier: 0.4666 (0.4521)  loss_mask: 0.5074 (0.5899)  loss_objectness: 0.1282 (0.1914)  loss_rpn_box_reg: 0.0784 (0.1143)  time: 0.4400 (0.4551)  data: 0.0171 (0.0178)  cpu2cuda: 0.1069 (0.1152)  forward: 0.2171 (0.2182)  backward: 0.0777 (0.0831)  lr: 0.020000  max mem: 3150
INFO: eta: 11:18:20  iter: 820  loss: 1.3302 (1.4839)  loss_box_reg: 0.1663 (0.1395)  loss_classifier: 0.4627 (0.4532)  loss_mask: 0.4746 (0.5873)  loss_objectness: 0.1106 (0.1901)  loss_rpn_box_reg: 0.0857 (0.1139)  time: 0.4915 (0.4564)  data: 0.0134 (0.0177)  cpu2cuda: 0.1099 (0.1151)  forward: 0.2168 (0.2185)  backward: 0.0947 (0.0836)  lr: 0.020000  max mem: 3150
INFO: eta: 11:18:16  iter: 840  loss: 1.3802 (1.4815)  loss_box_reg: 0.1743 (0.1402)  loss_classifier: 0.4432 (0.4537)  loss_mask: 0.4910 (0.5851)  loss_objectness: 0.1341 (0.1889)  loss_rpn_box_reg: 0.0609 (0.1136)  time: 0.4388 (0.4564)  data: 0.0175 (0.0178)  cpu2cuda: 0.0897 (0.1147)  forward: 0.2041 (0.2187)  backward: 0.0834 (0.0839)  lr: 0.020000  max mem: 3150
INFO: eta: 11:19:33  iter: 860  loss: 1.4601 (1.4808)  loss_box_reg: 0.1751 (0.1413)  loss_classifier: 0.5000 (0.4550)  loss_mask: 0.4871 (0.5829)  loss_objectness: 0.1441 (0.1881)  loss_rpn_box_reg: 0.0912 (0.1136)  time: 0.4993 (0.4574)  data: 0.0150 (0.0178)  cpu2cuda: 0.1059 (0.1145)  forward: 0.2173 (0.2192)  backward: 0.0958 (0.0844)  lr: 0.020000  max mem: 3150
INFO: eta: 11:19:36  iter: 880  loss: 1.3573 (1.4778)  loss_box_reg: 0.1448 (0.1416)  loss_classifier: 0.4136 (0.4546)  loss_mask: 0.4688 (0.5801)  loss_objectness: 0.1379 (0.1874)  loss_rpn_box_reg: 0.0714 (0.1140)  time: 0.4696 (0.4575)  data: 0.0168 (0.0178)  cpu2cuda: 0.1092 (0.1144)  forward: 0.2263 (0.2193)  backward: 0.0838 (0.0845)  lr: 0.020000  max mem: 3150
INFO: eta: 11:19:41  iter: 900  loss: 1.3399 (1.4742)  loss_box_reg: 0.1298 (0.1415)  loss_classifier: 0.3774 (0.4540)  loss_mask: 0.4728 (0.5781)  loss_objectness: 0.1435 (0.1865)  loss_rpn_box_reg: 0.0996 (0.1141)  time: 0.4442 (0.4577)  data: 0.0133 (0.0177)  cpu2cuda: 0.1196 (0.1144)  forward: 0.2141 (0.2196)  backward: 0.0782 (0.0846)  lr: 0.020000  max mem: 3150
INFO: eta: 11:19:38  iter: 920  loss: 1.2642 (1.4721)  loss_box_reg: 0.1208 (0.1414)  loss_classifier: 0.3592 (0.4533)  loss_mask: 0.4774 (0.5758)  loss_objectness: 0.1824 (0.1869)  loss_rpn_box_reg: 0.0963 (0.1146)  time: 0.4458 (0.4578)  data: 0.0177 (0.0178)  cpu2cuda: 0.1147 (0.1144)  forward: 0.2019 (0.2194)  backward: 0.0695 (0.0845)  lr: 0.020000  max mem: 3150
INFO: eta: 11:18:38  iter: 940  loss: 1.2608 (1.4677)  loss_box_reg: 0.1143 (0.1410)  loss_classifier: 0.3606 (0.4519)  loss_mask: 0.4795 (0.5738)  loss_objectness: 0.1210 (0.1866)  loss_rpn_box_reg: 0.0533 (0.1143)  time: 0.4236 (0.4572)  data: 0.0107 (0.0177)  cpu2cuda: 0.1272 (0.1147)  forward: 0.1952 (0.2190)  backward: 0.0629 (0.0842)  lr: 0.020000  max mem: 3150
INFO: eta: 11:19:10  iter: 960  loss: 1.2979 (1.4660)  loss_box_reg: 0.1443 (0.1413)  loss_classifier: 0.4196 (0.4514)  loss_mask: 0.5019 (0.5722)  loss_objectness: 0.1670 (0.1869)  loss_rpn_box_reg: 0.0812 (0.1142)  time: 0.4584 (0.4577)  data: 0.0155 (0.0178)  cpu2cuda: 0.1085 (0.1145)  forward: 0.2284 (0.2192)  backward: 0.0941 (0.0845)  lr: 0.020000  max mem: 3150
INFO: eta: 11:19:37  iter: 980  loss: 1.4718 (1.4667)  loss_box_reg: 0.1443 (0.1417)  loss_classifier: 0.4877 (0.4525)  loss_mask: 0.5164 (0.5710)  loss_objectness: 0.1626 (0.1870)  loss_rpn_box_reg: 0.1266 (0.1146)  time: 0.4362 (0.4581)  data: 0.0144 (0.0177)  cpu2cuda: 0.1171 (0.1144)  forward: 0.2035 (0.2194)  backward: 0.0738 (0.0848)  lr: 0.020000  max mem: 3150
INFO: eta: 11:20:22  iter: 1000  loss: 1.4057 (1.4662)  loss_box_reg: 0.1492 (0.1421)  loss_classifier: 0.4260 (0.4529)  loss_mask: 0.4791 (0.5692)  loss_objectness: 0.1729 (0.1870)  loss_rpn_box_reg: 0.1009 (0.1150)  time: 0.4639 (0.4587)  data: 0.0128 (0.0177)  cpu2cuda: 0.1192 (0.1143)  forward: 0.2081 (0.2196)  backward: 0.0833 (0.0849)  lr: 0.020000  max mem: 3150
INFO: eta: 11:20:31  iter: 1020  loss: 1.2775 (1.4635)  loss_box_reg: 0.1344 (0.1421)  loss_classifier: 0.3819 (0.4522)  loss_mask: 0.4858 (0.5676)  loss_objectness: 0.1292 (0.1863)  loss_rpn_box_reg: 0.0710 (0.1152)  time: 0.4564 (0.4589)  data: 0.0145 (0.0177)  cpu2cuda: 0.1289 (0.1145)  forward: 0.2042 (0.2194)  backward: 0.0691 (0.0849)  lr: 0.020000  max mem: 3150
INFO: eta: 11:20:30  iter: 1040  loss: 1.1655 (1.4600)  loss_box_reg: 0.1561 (0.1424)  loss_classifier: 0.4020 (0.4515)  loss_mask: 0.4736 (0.5660)  loss_objectness: 0.1026 (0.1854)  loss_rpn_box_reg: 0.0424 (0.1146)  time: 0.4324 (0.4590)  data: 0.0142 (0.0177)  cpu2cuda: 0.1193 (0.1145)  forward: 0.2108 (0.2195)  backward: 0.0726 (0.0849)  lr: 0.020000  max mem: 3150
INFO: eta: 11:20:49  iter: 1060  loss: 1.2989 (1.4586)  loss_box_reg: 0.1455 (0.1427)  loss_classifier: 0.4428 (0.4519)  loss_mask: 0.5101 (0.5646)  loss_objectness: 0.1336 (0.1849)  loss_rpn_box_reg: 0.0955 (0.1147)  time: 0.4536 (0.4593)  data: 0.0144 (0.0177)  cpu2cuda: 0.1153 (0.1145)  forward: 0.2172 (0.2196)  backward: 0.0756 (0.0849)  lr: 0.020000  max mem: 3150
INFO: eta: 11:20:51  iter: 1080  loss: 1.2639 (1.4548)  loss_box_reg: 0.1201 (0.1425)  loss_classifier: 0.3473 (0.4505)  loss_mask: 0.4856 (0.5633)  loss_objectness: 0.1286 (0.1841)  loss_rpn_box_reg: 0.0788 (0.1145)  time: 0.4443 (0.4594)  data: 0.0156 (0.0177)  cpu2cuda: 0.1070 (0.1144)  forward: 0.2117 (0.2197)  backward: 0.0765 (0.0850)  lr: 0.020000  max mem: 3150
INFO: eta: 11:20:46  iter: 1100  loss: 1.1469 (1.4501)  loss_box_reg: 0.1215 (0.1424)  loss_classifier: 0.3832 (0.4499)  loss_mask: 0.4812 (0.5616)  loss_objectness: 0.0790 (0.1825)  loss_rpn_box_reg: 0.0600 (0.1136)  time: 0.4421 (0.4595)  data: 0.0132 (0.0176)  cpu2cuda: 0.1212 (0.1145)  forward: 0.2037 (0.2197)  backward: 0.0682 (0.0849)  lr: 0.020000  max mem: 3150
INFO: eta: 11:21:28  iter: 1120  loss: 1.4617 (1.4511)  loss_box_reg: 0.1761 (0.1431)  loss_classifier: 0.5327 (0.4516)  loss_mask: 0.4678 (0.5598)  loss_objectness: 0.1397 (0.1830)  loss_rpn_box_reg: 0.0723 (0.1136)  time: 0.5088 (0.4600)  data: 0.0164 (0.0177)  cpu2cuda: 0.1160 (0.1145)  forward: 0.2219 (0.2200)  backward: 0.0810 (0.0850)  lr: 0.020000  max mem: 3150
INFO: eta: 11:21:51  iter: 1140  loss: 1.3548 (1.4514)  loss_box_reg: 0.1438 (0.1434)  loss_classifier: 0.4288 (0.4518)  loss_mask: 0.4968 (0.5587)  loss_objectness: 0.1977 (0.1836)  loss_rpn_box_reg: 0.0669 (0.1139)  time: 0.4698 (0.4604)  data: 0.0176 (0.0177)  cpu2cuda: 0.1157 (0.1144)  forward: 0.2209 (0.2202)  backward: 0.0814 (0.0853)  lr: 0.020000  max mem: 3273
INFO: eta: 11:22:26  iter: 1160  loss: 1.3427 (1.4511)  loss_box_reg: 0.1638 (0.1441)  loss_classifier: 0.4600 (0.4529)  loss_mask: 0.4664 (0.5572)  loss_objectness: 0.1262 (0.1832)  loss_rpn_box_reg: 0.0936 (0.1138)  time: 0.4809 (0.4609)  data: 0.0137 (0.0177)  cpu2cuda: 0.1062 (0.1142)  forward: 0.2181 (0.2206)  backward: 0.0942 (0.0855)  lr: 0.020000  max mem: 3273
INFO: eta: 11:22:50  iter: 1180  loss: 1.3135 (1.4501)  loss_box_reg: 0.1386 (0.1441)  loss_classifier: 0.4650 (0.4528)  loss_mask: 0.4468 (0.5553)  loss_objectness: 0.1685 (0.1835)  loss_rpn_box_reg: 0.0692 (0.1143)  time: 0.4649 (0.4613)  data: 0.0138 (0.0177)  cpu2cuda: 0.1181 (0.1142)  forward: 0.2101 (0.2208)  backward: 0.0678 (0.0856)  lr: 0.020000  max mem: 3273
INFO: eta: 11:23:10  iter: 1200  loss: 1.2336 (1.4477)  loss_box_reg: 0.1621 (0.1444)  loss_classifier: 0.4282 (0.4529)  loss_mask: 0.4587 (0.5539)  loss_objectness: 0.1211 (0.1826)  loss_rpn_box_reg: 0.0661 (0.1139)  time: 0.4582 (0.4616)  data: 0.0152 (0.0177)  cpu2cuda: 0.1007 (0.1141)  forward: 0.2152 (0.2210)  backward: 0.0837 (0.0857)  lr: 0.020000  max mem: 3273
INFO: eta: 11:23:02  iter: 1220  loss: 1.2236 (1.4450)  loss_box_reg: 0.1330 (0.1442)  loss_classifier: 0.4394 (0.4526)  loss_mask: 0.4318 (0.5519)  loss_objectness: 0.1276 (0.1822)  loss_rpn_box_reg: 0.0969 (0.1142)  time: 0.4600 (0.4616)  data: 0.0140 (0.0177)  cpu2cuda: 0.1153 (0.1141)  forward: 0.2002 (0.2209)  backward: 0.0702 (0.0856)  lr: 0.020000  max mem: 3273
INFO: eta: 11:23:40  iter: 1240  loss: 1.1518 (1.4422)  loss_box_reg: 0.1457 (0.1445)  loss_classifier: 0.3850 (0.4525)  loss_mask: 0.4426 (0.5502)  loss_objectness: 0.1243 (0.1813)  loss_rpn_box_reg: 0.0642 (0.1138)  time: 0.4834 (0.4622)  data: 0.0200 (0.0178)  cpu2cuda: 0.1100 (0.1141)  forward: 0.2294 (0.2213)  backward: 0.0821 (0.0858)  lr: 0.020000  max mem: 3273
INFO: eta: 11:24:01  iter: 1260  loss: 1.2766 (1.4402)  loss_box_reg: 0.1349 (0.1446)  loss_classifier: 0.4305 (0.4528)  loss_mask: 0.4346 (0.5487)  loss_objectness: 0.1248 (0.1806)  loss_rpn_box_reg: 0.0866 (0.1135)  time: 0.5034 (0.4625)  data: 0.0151 (0.0178)  cpu2cuda: 0.1135 (0.1141)  forward: 0.2074 (0.2212)  backward: 0.0744 (0.0859)  lr: 0.020000  max mem: 3273
INFO: eta: 11:24:09  iter: 1280  loss: 1.2564 (1.4391)  loss_box_reg: 0.1322 (0.1443)  loss_classifier: 0.3514 (0.4517)  loss_mask: 0.4578 (0.5475)  loss_objectness: 0.1377 (0.1821)  loss_rpn_box_reg: 0.0860 (0.1134)  time: 0.4556 (0.4627)  data: 0.0131 (0.0178)  cpu2cuda: 0.1150 (0.1141)  forward: 0.2050 (0.2211)  backward: 0.0666 (0.0859)  lr: 0.020000  max mem: 3273
INFO: eta: 11:24:00  iter: 1300  loss: 1.3876 (1.4396)  loss_box_reg: 0.1307 (0.1442)  loss_classifier: 0.4316 (0.4515)  loss_mask: 0.4962 (0.5466)  loss_objectness: 0.2181 (0.1836)  loss_rpn_box_reg: 0.1055 (0.1137)  time: 0.4579 (0.4627)  data: 0.0133 (0.0178)  cpu2cuda: 0.1253 (0.1143)  forward: 0.2055 (0.2210)  backward: 0.0739 (0.0857)  lr: 0.020000  max mem: 3273
INFO: eta: 11:24:31  iter: 1320  loss: 1.4171 (1.4392)  loss_box_reg: 0.1705 (0.1445)  loss_classifier: 0.4820 (0.4518)  loss_mask: 0.4673 (0.5454)  loss_objectness: 0.1533 (0.1836)  loss_rpn_box_reg: 0.0901 (0.1139)  time: 0.4649 (0.4631)  data: 0.0161 (0.0178)  cpu2cuda: 0.1141 (0.1142)  forward: 0.2058 (0.2213)  backward: 0.0842 (0.0859)  lr: 0.020000  max mem: 3273
INFO: eta: 11:24:10  iter: 1340  loss: 1.2669 (1.4367)  loss_box_reg: 0.1360 (0.1443)  loss_classifier: 0.4076 (0.4511)  loss_mask: 0.4557 (0.5442)  loss_objectness: 0.1463 (0.1832)  loss_rpn_box_reg: 0.0800 (0.1140)  time: 0.4322 (0.4630)  data: 0.0148 (0.0178)  cpu2cuda: 0.1205 (0.1142)  forward: 0.2119 (0.2213)  backward: 0.0804 (0.0860)  lr: 0.020000  max mem: 3273
INFO: eta: 11:24:12  iter: 1360  loss: 1.4154 (1.4355)  loss_box_reg: 0.1476 (0.1444)  loss_classifier: 0.4398 (0.4512)  loss_mask: 0.4522 (0.5430)  loss_objectness: 0.1275 (0.1826)  loss_rpn_box_reg: 0.0926 (0.1143)  time: 0.4675 (0.4631)  data: 0.0144 (0.0178)  cpu2cuda: 0.1142 (0.1141)  forward: 0.2017 (0.2215)  backward: 0.0797 (0.0860)  lr: 0.020000  max mem: 3273
INFO: eta: 11:24:07  iter: 1380  loss: 1.2346 (1.4354)  loss_box_reg: 0.1460 (0.1446)  loss_classifier: 0.4672 (0.4513)  loss_mask: 0.4896 (0.5423)  loss_objectness: 0.1470 (0.1828)  loss_rpn_box_reg: 0.0816 (0.1144)  time: 0.4637 (0.4632)  data: 0.0211 (0.0179)  cpu2cuda: 0.0922 (0.1139)  forward: 0.2307 (0.2216)  backward: 0.0882 (0.0862)  lr: 0.020000  max mem: 3273
INFO: eta: 11:24:32  iter: 1400  loss: 1.2104 (1.4343)  loss_box_reg: 0.1273 (0.1447)  loss_classifier: 0.4121 (0.4516)  loss_mask: 0.4433 (0.5410)  loss_objectness: 0.1626 (0.1823)  loss_rpn_box_reg: 0.1023 (0.1147)  time: 0.4961 (0.4636)  data: 0.0140 (0.0179)  cpu2cuda: 0.1189 (0.1139)  forward: 0.2141 (0.2217)  backward: 0.0791 (0.0863)  lr: 0.020000  max mem: 3273
INFO: eta: 11:25:34  iter: 1420  loss: 1.3347 (1.4339)  loss_box_reg: 0.1865 (0.1452)  loss_classifier: 0.5047 (0.4522)  loss_mask: 0.4837 (0.5399)  loss_objectness: 0.0887 (0.1819)  loss_rpn_box_reg: 0.0901 (0.1147)  time: 0.5229 (0.4644)  data: 0.0149 (0.0179)  cpu2cuda: 0.1155 (0.1139)  forward: 0.2449 (0.2222)  backward: 0.0855 (0.0864)  lr: 0.020000  max mem: 3273
INFO: eta: 11:25:56  iter: 1440  loss: 1.3153 (1.4326)  loss_box_reg: 0.1629 (0.1455)  loss_classifier: 0.4455 (0.4523)  loss_mask: 0.4635 (0.5388)  loss_objectness: 0.1359 (0.1815)  loss_rpn_box_reg: 0.0972 (0.1145)  time: 0.4920 (0.4647)  data: 0.0163 (0.0179)  cpu2cuda: 0.1200 (0.1138)  forward: 0.2206 (0.2223)  backward: 0.0826 (0.0866)  lr: 0.020000  max mem: 3273
INFO: eta: 11:25:47  iter: 1460  loss: 1.2140 (1.4312)  loss_box_reg: 0.1361 (0.1456)  loss_classifier: 0.4297 (0.4523)  loss_mask: 0.4435 (0.5377)  loss_objectness: 0.1296 (0.1812)  loss_rpn_box_reg: 0.0594 (0.1143)  time: 0.4533 (0.4647)  data: 0.0170 (0.0179)  cpu2cuda: 0.1152 (0.1139)  forward: 0.2163 (0.2225)  backward: 0.0715 (0.0864)  lr: 0.020000  max mem: 3273
INFO: eta: 11:26:06  iter: 1480  loss: 1.3437 (1.4296)  loss_box_reg: 0.1438 (0.1457)  loss_classifier: 0.4536 (0.4523)  loss_mask: 0.4508 (0.5367)  loss_objectness: 0.0970 (0.1808)  loss_rpn_box_reg: 0.0583 (0.1142)  time: 0.4664 (0.4651)  data: 0.0153 (0.0179)  cpu2cuda: 0.1187 (0.1138)  forward: 0.2284 (0.2228)  backward: 0.0781 (0.0866)  lr: 0.020000  max mem: 3273
INFO: eta: 11:26:02  iter: 1500  loss: 1.2773 (1.4286)  loss_box_reg: 0.1214 (0.1458)  loss_classifier: 0.4077 (0.4524)  loss_mask: 0.4667 (0.5358)  loss_objectness: 0.1209 (0.1804)  loss_rpn_box_reg: 0.0908 (0.1142)  time: 0.4579 (0.4651)  data: 0.0131 (0.0179)  cpu2cuda: 0.1286 (0.1140)  forward: 0.2033 (0.2226)  backward: 0.0657 (0.0864)  lr: 0.020000  max mem: 3273
INFO: eta: 11:26:05  iter: 1520  loss: 1.3348 (1.4270)  loss_box_reg: 0.1425 (0.1460)  loss_classifier: 0.4405 (0.4524)  loss_mask: 0.4363 (0.5346)  loss_objectness: 0.1139 (0.1799)  loss_rpn_box_reg: 0.0849 (0.1141)  time: 0.4378 (0.4653)  data: 0.0135 (0.0179)  cpu2cuda: 0.0996 (0.1139)  forward: 0.2083 (0.2227)  backward: 0.0900 (0.0866)  lr: 0.020000  max mem: 3273
INFO: eta: 11:25:59  iter: 1540  loss: 1.2242 (1.4244)  loss_box_reg: 0.1164 (0.1457)  loss_classifier: 0.3341 (0.4513)  loss_mask: 0.4476 (0.5334)  loss_objectness: 0.1547 (0.1797)  loss_rpn_box_reg: 0.0962 (0.1143)  time: 0.4524 (0.4653)  data: 0.0150 (0.0179)  cpu2cuda: 0.1201 (0.1139)  forward: 0.2106 (0.2227)  backward: 0.0721 (0.0866)  lr: 0.020000  max mem: 3273
INFO: eta: 11:26:32  iter: 1560  loss: 1.3610 (1.4243)  loss_box_reg: 0.1626 (0.1461)  loss_classifier: 0.4904 (0.4524)  loss_mask: 0.4439 (0.5324)  loss_objectness: 0.1352 (0.1792)  loss_rpn_box_reg: 0.0989 (0.1141)  time: 0.5037 (0.4658)  data: 0.0138 (0.0179)  cpu2cuda: 0.1173 (0.1139)  forward: 0.2162 (0.2229)  backward: 0.0907 (0.0868)  lr: 0.020000  max mem: 3273
INFO: eta: 11:26:43  iter: 1580  loss: 1.3862 (1.4230)  loss_box_reg: 0.1435 (0.1462)  loss_classifier: 0.4196 (0.4523)  loss_mask: 0.4290 (0.5312)  loss_objectness: 0.1650 (0.1791)  loss_rpn_box_reg: 0.0776 (0.1142)  time: 0.4742 (0.4660)  data: 0.0170 (0.0179)  cpu2cuda: 0.1283 (0.1140)  forward: 0.2213 (0.2230)  backward: 0.0790 (0.0868)  lr: 0.020000  max mem: 3273
INFO: eta: 11:27:19  iter: 1600  loss: 1.2703 (1.4213)  loss_box_reg: 0.1463 (0.1463)  loss_classifier: 0.4256 (0.4521)  loss_mask: 0.4321 (0.5300)  loss_objectness: 0.1267 (0.1788)  loss_rpn_box_reg: 0.1001 (0.1142)  time: 0.4736 (0.4665)  data: 0.0137 (0.0179)  cpu2cuda: 0.1188 (0.1139)  forward: 0.2191 (0.2231)  backward: 0.0753 (0.0870)  lr: 0.020000  max mem: 3273
INFO: eta: 11:27:46  iter: 1620  loss: 1.2103 (1.4194)  loss_box_reg: 0.1495 (0.1464)  loss_classifier: 0.4146 (0.4521)  loss_mask: 0.4290 (0.5289)  loss_objectness: 0.1248 (0.1781)  loss_rpn_box_reg: 0.0973 (0.1140)  time: 0.4593 (0.4669)  data: 0.0129 (0.0179)  cpu2cuda: 0.1048 (0.1139)  forward: 0.2447 (0.2236)  backward: 0.0796 (0.0871)  lr: 0.020000  max mem: 3273
INFO: eta: 11:27:49  iter: 1640  loss: 1.1808 (1.4179)  loss_box_reg: 0.1302 (0.1465)  loss_classifier: 0.3935 (0.4517)  loss_mask: 0.4449 (0.5280)  loss_objectness: 0.1249 (0.1776)  loss_rpn_box_reg: 0.0955 (0.1141)  time: 0.4538 (0.4671)  data: 0.0185 (0.0179)  cpu2cuda: 0.1081 (0.1136)  forward: 0.2301 (0.2238)  backward: 0.0819 (0.0872)  lr: 0.020000  max mem: 3273
INFO: eta: 11:27:48  iter: 1660  loss: 1.0554 (1.4155)  loss_box_reg: 0.1203 (0.1464)  loss_classifier: 0.3368 (0.4510)  loss_mask: 0.4296 (0.5271)  loss_objectness: 0.1124 (0.1771)  loss_rpn_box_reg: 0.0874 (0.1139)  time: 0.4631 (0.4672)  data: 0.0145 (0.0180)  cpu2cuda: 0.1190 (0.1136)  forward: 0.2103 (0.2238)  backward: 0.0777 (0.0872)  lr: 0.020000  max mem: 3273
INFO: eta: 11:28:14  iter: 1680  loss: 1.1996 (1.4132)  loss_box_reg: 0.1500 (0.1467)  loss_classifier: 0.4520 (0.4512)  loss_mask: 0.4379 (0.5260)  loss_objectness: 0.0933 (0.1762)  loss_rpn_box_reg: 0.0581 (0.1133)  time: 0.4892 (0.4676)  data: 0.0174 (0.0180)  cpu2cuda: 0.1146 (0.1136)  forward: 0.2311 (0.2240)  backward: 0.0849 (0.0873)  lr: 0.020000  max mem: 3273

The gpu memory consumption given by nvidia-smi

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 384.145                Driver Version: 384.145                   |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla P100-PCIE...  On   | 00000000:05:00.0 Off |                    0 |
| N/A   49C    P0   139W / 250W |   5925MiB / 16276MiB |     57%      Default |
+-------------------------------+----------------------+----------------------+
|   1  Tesla P100-PCIE...  On   | 00000000:87:00.0 Off |                    0 |
| N/A   51C    P0   144W / 250W |   6611MiB / 16276MiB |     90%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0     94173      C   .../anaconda3/bin/python  5915MiB                   |
|    1     94174      C   .../anaconda3/bin/python  6601MiB                   |
+-----------------------------------------------------------------------------+

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Reactions:1
  • Comments:12 (11 by maintainers)

github_iconTop GitHub Comments

1reaction
Johnqczhangcommented, Jul 26, 2019

Finally, it took me more than 2 days to finish the training of Keypoints R-CNN with 2 P100 GPUs, and I got the similar performance on coco_minival2014 dataset as reported in MODEL_ZOO.md

maskrcnn/maskrcnn_benchmark/engine/trainer.py: 115 INFO: eta: 0:00:09  iter: 359980  loss: 3.1357 (3.7096)  loss_box_reg: 0.0601 (0.0823)  loss_classifier: 0.0848 (0.1199)  loss_kp: 3.0231 (3.4649)  loss_objectness: 0.0114 (0.0211)  loss_rpn_box_reg: 0.0086 (0.0215)  time: 0.4939 (0.4815)  data: 0.0082 (0.0086)  cpu2cuda: 0.1205 (0.1392)  forward: 0.2140 (0.1972)  backward: 0.1491 (0.1221)  lr: 0.000050  max mem: 5099
maskrcnn/maskrcnn_benchmark/engine/trainer.py: 115 INFO: eta: 0:00:00  iter: 360000  loss: 2.9855 (3.7095)  loss_box_reg: 0.0491 (0.0823)  loss_classifier: 0.0808 (0.1199)  loss_kp: 2.8103 (3.4648)  loss_objectness: 0.0087 (0.0211)  loss_rpn_box_reg: 0.0087 (0.0215)  time: 0.4659 (0.4815)  data: 0.0079 (0.0086)  cpu2cuda: 0.1253 (0.1391)  forward: 0.1948 (0.1972)  backward: 0.1201 (0.1221)  lr: 0.000050  max mem: 5099
maskrcnn/maskrcnn_benchmark/utils/checkpoint.py:  48 INFO: Saving checkpoint to ./inference/model_0360000.pth
maskrcnn_benchmark/utils/checkpoint.py:  48 INFO: Saving checkpoint to ./inference/model_final.pth
maskrcnn_benchmark/engine/trainer.py: 127 INFO: Total training time: 2 days, 0:09:15.625577 (0.4815 s / it)                                                                                    
maskrcnn_benchmark/engine/inference.py:  85 INFO: Start evaluation on keypoints_coco_2014_minival dataset(5000 images).                                                                        
maskrcnn_benchmark/engine/inference.py:  97 INFO: Total run time: 0:04:00.126393 (0.09605055704116822 s / img per device, on 2 devices)
maskrcnn_benchmark/engine/inference.py: 105 INFO: Model inference time: 0:03:52.446272 (0.09297850875854492 s / img per device, on 2 devices)                                                  
maskrcnn_benchmark/data/datasets/evaluation/coco/coco_eval.py:  43 INFO: Preparing results for COCO format
maskrcnn_benchmark/data/datasets/evaluation/coco/coco_eval.py:  46 INFO: Preparing bbox results                                                                                                
maskrcnn_benchmark/data/datasets/evaluation/coco/coco_eval.py:  52 INFO: Preparing keypoints results
maskrcnn_benchmark/data/datasets/evaluation/coco/coco_eval.py:  59 INFO: Evaluating predictions                                                                                                
maskrcnn_benchmark/data/datasets/evaluation/coco/coco_eval.py:  69 INFO: Task: bbox
AP, AP50, AP75, APs, APm, APl                                                                                                                                                                  
0.5385, 0.8280, 0.5868, 0.3668, 0.6160, 0.6982
Task: keypoints                                                                                                                                                                                
AP, AP50, AP75, APm, APl
0.6461, 0.8616, 0.7060, 0.5949, 0.7280

Hi @fmassa, I’d appreciate it if you can tell me why the training time and speed between mine and yours differ much but the inference time is very similar (mine is even faster than yours). Is there a possible reason in my situation that the communication between two GPUs (i.e., nccl) doesn’t work well? Looking forward to your reply, thank you very much!

0reactions
Jacobewcommented, Oct 21, 2019

@sarahmass Yes, it did.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Discontinuity in CNN Training Time with Increase Batch Size
In this article, we discuss one of many interesting findings from our research on model training time. To be clear about what we...
Read more >
Effect of batch size on training dynamics | by Kevin Shen
Some works in the optimization literature have shown that increasing the learning rate can compensate for larger batch sizes. With this in mind, ......
Read more >
How to Control the Stability of Training Neural Networks With ...
The effect will be more time between weight updates and we would expect faster training than other batch sizes, and more stable estimates...
Read more >
What is the trade-off between batch size and number of ...
Here we show one can usually obtain the same learning curve on both training and test sets by instead increasing the batch size...
Read more >
DON'T DECAY THE LEARNING RATE, INCREASE THE ...
batches can be parallelized across many machines, reducing training time. Unfortunately, when we increase the batch size the test set accuracy often falls ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found