Increasing bacth time when training e2e_mask_rcnn_R_50_FPN_1x
See original GitHub issue❓ Questions and Help
Hi @fmassa, thanks for your implementation. When I re-train e2e_mask_rcnn_R_50_FPN_1x model using the same configuration as given by default, I noticed that the time consumption of each iteration increases gradually at the beginning of the training, so I logged the time consumption of each step during an iteration. As the following log shows, the time consumption of both forward and backward for an iteration increases as the training goes on, so does the “eta”.
Another confusing issue is that the gpu memory consumption also increases at the beginning of the training. Although it stables later, the value (3273
in my case) doesn’t match the memory consumption given by nvidia-smi
.
Thanks for your attention! 😃
My environment information:
PyTorch version: 1.1.0
Is debug build: No
CUDA used to build PyTorch: 9.0.176
OS: Linux Server 7.2
GCC version: (GCC) 4.9.2
CMake version: version 3.8.2
Python version: 3.7
Is CUDA available: Yes
CUDA runtime version: 9.0.176
GPU models and configuration:
GPU 0: Tesla P100-PCIE-16GB
GPU 1: Tesla P100-PCIE-16GB
Nvidia driver version: 384.145
cuDNN version: libcudnn.so.7.6.0
Versions of relevant libraries:
[pip] numpy==1.16.4
[pip] numpydoc==0.9.1
[pip] torch==1.1.0
[pip] torchvision==0.2.2
[conda] blas 1.0 mkl
[conda] mkl 2019.4 243
[conda] mkl-include 2019.4 243
[conda] mkl-service 2.0.2 py37h7b6447c_0
[conda] mkl_fft 1.0.12 py37ha843d7b_0
[conda] mkl_random 1.0.2 py37hd81dba3_0
[conda] torch 1.1.0 pypi_0 pypi
Pillow (6.1.0)
The training log of e2e_mask_rcnn_R_50_FPN_1x
:
INFO: Start training
INFO: eta: 12:20:53 iter: 20 loss: 1.8344 (2.1594) loss_box_reg: 0.0809 (0.0799) loss_classifier: 0.4756 (0.7001) loss_mask: 0.7108 (0.7706) loss_objectness: 0.3915 (0.4632) loss_rpn_box_reg: 0.0932 (0.1456) time: 0.4128 (0.4940) data: 0.0124 (0.0459) cpu2cuda: 0.1303 (0.1154) forward: 0.1933 (0.2479) backward: 0.0574 (0.0698) lr: 0.007200 max mem: 2957
INFO: eta: 11:20:44 iter: 40 loss: 1.4896 (1.8123) loss_box_reg: 0.0897 (0.0842) loss_classifier: 0.4175 (0.5472) loss_mask: 0.6901 (0.7303) loss_objectness: 0.2093 (0.3408) loss_rpn_box_reg: 0.0544 (0.1098) time: 0.3988 (0.4540) data: 0.0095 (0.0287) cpu2cuda: 0.1340 (0.1209) forward: 0.1884 (0.2195) backward: 0.0594 (0.0687) lr: 0.007733 max mem: 2957
INFO: eta: 11:08:29 iter: 60 loss: 1.4800 (1.7611) loss_box_reg: 0.0832 (0.0933) loss_classifier: 0.3759 (0.5120) loss_mask: 0.6892 (0.7161) loss_objectness: 0.2313 (0.3252) loss_rpn_box_reg: 0.0897 (0.1146) time: 0.4268 (0.4460) data: 0.0146 (0.0246) cpu2cuda: 0.1180 (0.1180) forward: 0.1988 (0.2151) backward: 0.0683 (0.0728) lr: 0.008267 max mem: 2957
INFO: eta: 10:51:55 iter: 80 loss: 1.2965 (1.6597) loss_box_reg: 0.0588 (0.0884) loss_classifier: 0.2801 (0.4598) loss_mask: 0.6854 (0.7076) loss_objectness: 0.1827 (0.2940) loss_rpn_box_reg: 0.0745 (0.1099) time: 0.3816 (0.4350) data: 0.0092 (0.0210) cpu2cuda: 0.1328 (0.1212) forward: 0.1893 (0.2088) backward: 0.0525 (0.0694) lr: 0.008800 max mem: 2957
INFO: eta: 10:42:00 iter: 100 loss: 1.3173 (1.5981) loss_box_reg: 0.0571 (0.0844) loss_classifier: 0.2880 (0.4290) loss_mask: 0.6679 (0.6994) loss_objectness: 0.1850 (0.2749) loss_rpn_box_reg: 0.1171 (0.1105) time: 0.4012 (0.4285) data: 0.0097 (0.0189) cpu2cuda: 0.1339 (0.1242) forward: 0.1873 (0.2047) backward: 0.0537 (0.0665) lr: 0.009333 max mem: 2957
INFO: eta: 10:39:14 iter: 120 loss: 1.3781 (1.5678) loss_box_reg: 0.0747 (0.0837) loss_classifier: 0.2876 (0.4133) loss_mask: 0.6767 (0.6946) loss_objectness: 0.2126 (0.2666) loss_rpn_box_reg: 0.1049 (0.1095) time: 0.4035 (0.4267) data: 0.0107 (0.0177) cpu2cuda: 0.1307 (0.1258) forward: 0.1943 (0.2034) backward: 0.0552 (0.0657) lr: 0.009867 max mem: 2957
INFO: eta: 10:37:59 iter: 140 loss: 1.4151 (1.5545) loss_box_reg: 0.0892 (0.0864) loss_classifier: 0.3723 (0.4117) loss_mask: 0.6731 (0.6920) loss_objectness: 0.1878 (0.2565) loss_rpn_box_reg: 0.0732 (0.1079) time: 0.4150 (0.4260) data: 0.0112 (0.0169) cpu2cuda: 0.1350 (0.1266) forward: 0.1986 (0.2031) backward: 0.0583 (0.0654) lr: 0.010400 max mem: 2957
INFO: eta: 10:36:36 iter: 160 loss: 1.3080 (1.5355) loss_box_reg: 0.0709 (0.0850) loss_classifier: 0.3262 (0.4041) loss_mask: 0.6504 (0.6872) loss_objectness: 0.1965 (0.2505) loss_rpn_box_reg: 0.0886 (0.1087) time: 0.4124 (0.4252) data: 0.0097 (0.0167) cpu2cuda: 0.1293 (0.1265) forward: 0.1983 (0.2030) backward: 0.0583 (0.0656) lr: 0.010933 max mem: 2957
INFO: eta: 10:40:31 iter: 180 loss: 1.4487 (1.5269) loss_box_reg: 0.1099 (0.0890) loss_classifier: 0.3957 (0.4051) loss_mask: 0.6496 (0.6833) loss_objectness: 0.1350 (0.2418) loss_rpn_box_reg: 0.0967 (0.1076) time: 0.4232 (0.4279) data: 0.0115 (0.0163) cpu2cuda: 0.1287 (0.1259) forward: 0.1917 (0.2033) backward: 0.0645 (0.0672) lr: 0.011467 max mem: 2957
INFO: eta: 10:39:20 iter: 200 loss: 1.3171 (1.5171) loss_box_reg: 0.1233 (0.0926) loss_classifier: 0.3784 (0.4066) loss_mask: 0.6538 (0.6797) loss_objectness: 0.1193 (0.2324) loss_rpn_box_reg: 0.0555 (0.1059) time: 0.4112 (0.4272) data: 0.0131 (0.0161) cpu2cuda: 0.1209 (0.1253) forward: 0.1942 (0.2039) backward: 0.0596 (0.0675) lr: 0.012000 max mem: 2957
INFO: eta: 10:42:17 iter: 220 loss: 1.4505 (1.5276) loss_box_reg: 0.1381 (0.0981) loss_classifier: 0.4531 (0.4163) loss_mask: 0.6570 (0.6773) loss_objectness: 0.1509 (0.2292) loss_rpn_box_reg: 0.0731 (0.1068) time: 0.4373 (0.4292) data: 0.0137 (0.0163) cpu2cuda: 0.1250 (0.1241) forward: 0.2011 (0.2056) backward: 0.0675 (0.0692) lr: 0.012533 max mem: 2957
INFO: eta: 10:43:42 iter: 240 loss: 1.4663 (1.5232) loss_box_reg: 0.1024 (0.0994) loss_classifier: 0.4020 (0.4155) loss_mask: 0.6384 (0.6745) loss_objectness: 0.1889 (0.2267) loss_rpn_box_reg: 0.0918 (0.1071) time: 0.4278 (0.4303) data: 0.0121 (0.0161) cpu2cuda: 0.1228 (0.1239) forward: 0.2125 (0.2065) backward: 0.0665 (0.0701) lr: 0.013067 max mem: 2957
INFO: eta: 10:46:49 iter: 260 loss: 1.3772 (1.5198) loss_box_reg: 0.1254 (0.1028) loss_classifier: 0.4421 (0.4177) loss_mask: 0.6168 (0.6702) loss_objectness: 0.1534 (0.2222) loss_rpn_box_reg: 0.0947 (0.1069) time: 0.4475 (0.4325) data: 0.0181 (0.0163) cpu2cuda: 0.1118 (0.1222) forward: 0.2209 (0.2092) backward: 0.0759 (0.0718) lr: 0.013600 max mem: 3131
INFO: eta: 10:48:30 iter: 280 loss: 1.4391 (1.5191) loss_box_reg: 0.1371 (0.1066) loss_classifier: 0.4118 (0.4199) loss_mask: 0.5962 (0.6656) loss_objectness: 0.1755 (0.2190) loss_rpn_box_reg: 0.0967 (0.1080) time: 0.4383 (0.4337) data: 0.0171 (0.0167) cpu2cuda: 0.1211 (0.1215) forward: 0.2095 (0.2101) backward: 0.0667 (0.0723) lr: 0.014133 max mem: 3131
INFO: eta: 10:51:17 iter: 300 loss: 1.4413 (1.5184) loss_box_reg: 0.1337 (0.1093) loss_classifier: 0.4201 (0.4218) loss_mask: 0.6180 (0.6621) loss_objectness: 0.1562 (0.2178) loss_rpn_box_reg: 0.0780 (0.1074) time: 0.4521 (0.4356) data: 0.0161 (0.0170) cpu2cuda: 0.1166 (0.1207) forward: 0.2226 (0.2109) backward: 0.0748 (0.0734) lr: 0.014667 max mem: 3131
INFO: eta: 10:53:10 iter: 320 loss: 1.4700 (1.5158) loss_box_reg: 0.1329 (0.1118) loss_classifier: 0.4295 (0.4238) loss_mask: 0.5635 (0.6574) loss_objectness: 0.1364 (0.2145) loss_rpn_box_reg: 0.0643 (0.1083) time: 0.4628 (0.4370) data: 0.0129 (0.0168) cpu2cuda: 0.1140 (0.1196) forward: 0.2092 (0.2118) backward: 0.0961 (0.0753) lr: 0.015200 max mem: 3131
INFO: eta: 10:55:10 iter: 340 loss: 1.5558 (1.5217) loss_box_reg: 0.1513 (0.1148) loss_classifier: 0.4643 (0.4279) loss_mask: 0.6037 (0.6541) loss_objectness: 0.1849 (0.2147) loss_rpn_box_reg: 0.0938 (0.1102) time: 0.4561 (0.4384) data: 0.0174 (0.0171) cpu2cuda: 0.1145 (0.1189) forward: 0.2164 (0.2120) backward: 0.0767 (0.0760) lr: 0.015733 max mem: 3131
INFO: eta: 10:56:08 iter: 360 loss: 1.3707 (1.5178) loss_box_reg: 0.1371 (0.1168) loss_classifier: 0.4722 (0.4301) loss_mask: 0.5698 (0.6502) loss_objectness: 0.1456 (0.2115) loss_rpn_box_reg: 0.0804 (0.1094) time: 0.4343 (0.4392) data: 0.0132 (0.0170) cpu2cuda: 0.1288 (0.1192) forward: 0.2081 (0.2118) backward: 0.0665 (0.0758) lr: 0.016267 max mem: 3131
INFO: eta: 11:00:07 iter: 380 loss: 1.6307 (1.5196) loss_box_reg: 0.1565 (0.1186) loss_classifier: 0.4888 (0.4319) loss_mask: 0.6033 (0.6476) loss_objectness: 0.1703 (0.2110) loss_rpn_box_reg: 0.1381 (0.1105) time: 0.4972 (0.4419) data: 0.0153 (0.0171) cpu2cuda: 0.1180 (0.1189) forward: 0.2079 (0.2127) backward: 0.0702 (0.0768) lr: 0.016800 max mem: 3131
INFO: eta: 11:02:06 iter: 400 loss: 1.3411 (1.5151) loss_box_reg: 0.1450 (0.1201) loss_classifier: 0.4000 (0.4316) loss_mask: 0.5971 (0.6449) loss_objectness: 0.1077 (0.2084) loss_rpn_box_reg: 0.0732 (0.1101) time: 0.4465 (0.4434) data: 0.0148 (0.0171) cpu2cuda: 0.0952 (0.1182) forward: 0.2020 (0.2125) backward: 0.0933 (0.0782) lr: 0.017333 max mem: 3131
INFO: eta: 11:01:45 iter: 420 loss: 1.4029 (1.5127) loss_box_reg: 0.1201 (0.1208) loss_classifier: 0.3815 (0.4302) loss_mask: 0.6024 (0.6433) loss_objectness: 0.1645 (0.2076) loss_rpn_box_reg: 0.1111 (0.1109) time: 0.4344 (0.4432) data: 0.0135 (0.0171) cpu2cuda: 0.1189 (0.1180) forward: 0.2010 (0.2124) backward: 0.0656 (0.0781) lr: 0.017867 max mem: 3131
INFO: eta: 11:03:44 iter: 440 loss: 1.3946 (1.5125) loss_box_reg: 0.1232 (0.1219) loss_classifier: 0.4124 (0.4324) loss_mask: 0.5625 (0.6399) loss_objectness: 0.1774 (0.2062) loss_rpn_box_reg: 0.1292 (0.1122) time: 0.4359 (0.4447) data: 0.0158 (0.0173) cpu2cuda: 0.1210 (0.1180) forward: 0.2124 (0.2127) backward: 0.0695 (0.0784) lr: 0.018400 max mem: 3131
INFO: eta: 11:04:50 iter: 460 loss: 1.5121 (1.5134) loss_box_reg: 0.1594 (0.1242) loss_classifier: 0.4827 (0.4355) loss_mask: 0.5583 (0.6362) loss_objectness: 0.1660 (0.2046) loss_rpn_box_reg: 0.1026 (0.1130) time: 0.4418 (0.4455) data: 0.0170 (0.0175) cpu2cuda: 0.1101 (0.1176) forward: 0.2101 (0.2133) backward: 0.0697 (0.0788) lr: 0.018933 max mem: 3131
INFO: eta: 11:04:47 iter: 480 loss: 1.3543 (1.5083) loss_box_reg: 0.1279 (0.1244) loss_classifier: 0.3937 (0.4354) loss_mask: 0.5632 (0.6330) loss_objectness: 0.1731 (0.2029) loss_rpn_box_reg: 0.0806 (0.1126) time: 0.4420 (0.4456) data: 0.0139 (0.0174) cpu2cuda: 0.1218 (0.1179) forward: 0.2139 (0.2137) backward: 0.0682 (0.0786) lr: 0.019467 max mem: 3131
INFO: eta: 11:07:36 iter: 500 loss: 1.5172 (1.5122) loss_box_reg: 0.1848 (0.1269) loss_classifier: 0.5513 (0.4405) loss_mask: 0.5547 (0.6303) loss_objectness: 0.1759 (0.2021) loss_rpn_box_reg: 0.0831 (0.1124) time: 0.4883 (0.4476) data: 0.0177 (0.0175) cpu2cuda: 0.1091 (0.1173) forward: 0.2139 (0.2144) backward: 0.0857 (0.0797) lr: 0.020000 max mem: 3131
INFO: eta: 11:09:31 iter: 520 loss: 1.4537 (1.5131) loss_box_reg: 0.1538 (0.1294) loss_classifier: 0.5222 (0.4445) loss_mask: 0.5596 (0.6275) loss_objectness: 0.1241 (0.1995) loss_rpn_box_reg: 0.0683 (0.1123) time: 0.5054 (0.4489) data: 0.0171 (0.0175) cpu2cuda: 0.0947 (0.1164) forward: 0.2117 (0.2151) backward: 0.0931 (0.0809) lr: 0.020000 max mem: 3131
INFO: eta: 11:08:28 iter: 540 loss: 1.3338 (1.5086) loss_box_reg: 0.1210 (0.1294) loss_classifier: 0.3701 (0.4437) loss_mask: 0.5264 (0.6241) loss_objectness: 0.1558 (0.1987) loss_rpn_box_reg: 0.0874 (0.1127) time: 0.4245 (0.4483) data: 0.0146 (0.0176) cpu2cuda: 0.1203 (0.1165) forward: 0.2159 (0.2151) backward: 0.0705 (0.0805) lr: 0.020000 max mem: 3131
INFO: eta: 11:09:55 iter: 560 loss: 1.3118 (1.5056) loss_box_reg: 0.1342 (0.1303) loss_classifier: 0.4353 (0.4435) loss_mask: 0.5320 (0.6217) loss_objectness: 0.1433 (0.1975) loss_rpn_box_reg: 0.0967 (0.1126) time: 0.4589 (0.4494) data: 0.0180 (0.0177) cpu2cuda: 0.1188 (0.1164) forward: 0.2182 (0.2161) backward: 0.0775 (0.0806) lr: 0.020000 max mem: 3131
INFO: eta: 11:10:10 iter: 580 loss: 1.3408 (1.5029) loss_box_reg: 0.1251 (0.1304) loss_classifier: 0.4113 (0.4428) loss_mask: 0.5330 (0.6189) loss_objectness: 0.1415 (0.1982) loss_rpn_box_reg: 0.0856 (0.1125) time: 0.4313 (0.4497) data: 0.0145 (0.0177) cpu2cuda: 0.1261 (0.1165) forward: 0.2007 (0.2162) backward: 0.0685 (0.0804) lr: 0.020000 max mem: 3131
INFO: eta: 11:08:44 iter: 600 loss: 1.2969 (1.4975) loss_box_reg: 0.1264 (0.1297) loss_classifier: 0.3952 (0.4405) loss_mask: 0.5154 (0.6155) loss_objectness: 0.2217 (0.1985) loss_rpn_box_reg: 0.1029 (0.1133) time: 0.4169 (0.4488) data: 0.0117 (0.0175) cpu2cuda: 0.1247 (0.1167) forward: 0.1943 (0.2159) backward: 0.0645 (0.0802) lr: 0.020000 max mem: 3150
INFO: eta: 11:10:06 iter: 620 loss: 1.5374 (1.4976) loss_box_reg: 0.1542 (0.1308) loss_classifier: 0.4586 (0.4420) loss_mask: 0.5166 (0.6127) loss_objectness: 0.1697 (0.1982) loss_rpn_box_reg: 0.1198 (0.1140) time: 0.4781 (0.4498) data: 0.0155 (0.0175) cpu2cuda: 0.1057 (0.1163) forward: 0.2199 (0.2164) backward: 0.0900 (0.0810) lr: 0.020000 max mem: 3150
INFO: eta: 11:11:00 iter: 640 loss: 1.3474 (1.4970) loss_box_reg: 0.1557 (0.1321) loss_classifier: 0.4703 (0.4449) loss_mask: 0.5015 (0.6098) loss_objectness: 0.1281 (0.1964) loss_rpn_box_reg: 0.0733 (0.1137) time: 0.4888 (0.4505) data: 0.0179 (0.0176) cpu2cuda: 0.1139 (0.1160) forward: 0.2094 (0.2168) backward: 0.0787 (0.0813) lr: 0.020000 max mem: 3150
INFO: eta: 11:11:38 iter: 660 loss: 1.3212 (1.4941) loss_box_reg: 0.1433 (0.1330) loss_classifier: 0.4361 (0.4460) loss_mask: 0.5075 (0.6069) loss_objectness: 0.1383 (0.1949) loss_rpn_box_reg: 0.0770 (0.1132) time: 0.4615 (0.4511) data: 0.0171 (0.0177) cpu2cuda: 0.1180 (0.1158) forward: 0.2084 (0.2168) backward: 0.0792 (0.0816) lr: 0.020000 max mem: 3150
INFO: eta: 11:12:27 iter: 680 loss: 1.5639 (1.4951) loss_box_reg: 0.1397 (0.1339) loss_classifier: 0.4790 (0.4468) loss_mask: 0.5087 (0.6038) loss_objectness: 0.2357 (0.1966) loss_rpn_box_reg: 0.1220 (0.1140) time: 0.4541 (0.4517) data: 0.0121 (0.0176) cpu2cuda: 0.1308 (0.1160) forward: 0.2091 (0.2168) backward: 0.0692 (0.0816) lr: 0.020000 max mem: 3150
INFO: eta: 11:12:04 iter: 700 loss: 1.4248 (1.4917) loss_box_reg: 0.1590 (0.1342) loss_classifier: 0.4564 (0.4470) loss_mask: 0.4977 (0.6008) loss_objectness: 0.1468 (0.1956) loss_rpn_box_reg: 0.0818 (0.1142) time: 0.4293 (0.4516) data: 0.0136 (0.0176) cpu2cuda: 0.1206 (0.1160) forward: 0.2051 (0.2168) backward: 0.0674 (0.0816) lr: 0.020000 max mem: 3150
INFO: eta: 11:13:17 iter: 720 loss: 1.5100 (1.4919) loss_box_reg: 0.1465 (0.1352) loss_classifier: 0.4196 (0.4478) loss_mask: 0.5119 (0.5985) loss_objectness: 0.1686 (0.1950) loss_rpn_box_reg: 0.1084 (0.1153) time: 0.4487 (0.4525) data: 0.0218 (0.0177) cpu2cuda: 0.1146 (0.1159) forward: 0.2206 (0.2172) backward: 0.0822 (0.0817) lr: 0.020000 max mem: 3150
INFO: eta: 11:14:15 iter: 740 loss: 1.4570 (1.4908) loss_box_reg: 0.1518 (0.1361) loss_classifier: 0.4781 (0.4488) loss_mask: 0.5070 (0.5961) loss_objectness: 0.1752 (0.1942) loss_rpn_box_reg: 0.1007 (0.1155) time: 0.4968 (0.4532) data: 0.0163 (0.0178) cpu2cuda: 0.1122 (0.1157) forward: 0.2176 (0.2178) backward: 0.0864 (0.0820) lr: 0.020000 max mem: 3150
INFO: eta: 11:15:05 iter: 760 loss: 1.4302 (1.4901) loss_box_reg: 0.1487 (0.1369) loss_classifier: 0.4933 (0.4504) loss_mask: 0.5077 (0.5939) loss_objectness: 0.1690 (0.1937) loss_rpn_box_reg: 0.0551 (0.1152) time: 0.4706 (0.4539) data: 0.0143 (0.0177) cpu2cuda: 0.1006 (0.1154) forward: 0.2128 (0.2179) backward: 0.0852 (0.0826) lr: 0.020000 max mem: 3150
INFO: eta: 11:16:18 iter: 780 loss: 1.3618 (1.4883) loss_box_reg: 0.1646 (0.1377) loss_classifier: 0.4730 (0.4513) loss_mask: 0.5036 (0.5916) loss_objectness: 0.1263 (0.1930) loss_rpn_box_reg: 0.0764 (0.1147) time: 0.4732 (0.4548) data: 0.0148 (0.0177) cpu2cuda: 0.1088 (0.1153) forward: 0.2292 (0.2182) backward: 0.0815 (0.0830) lr: 0.020000 max mem: 3150
INFO: eta: 11:16:35 iter: 800 loss: 1.3844 (1.4862) loss_box_reg: 0.1618 (0.1386) loss_classifier: 0.4666 (0.4521) loss_mask: 0.5074 (0.5899) loss_objectness: 0.1282 (0.1914) loss_rpn_box_reg: 0.0784 (0.1143) time: 0.4400 (0.4551) data: 0.0171 (0.0178) cpu2cuda: 0.1069 (0.1152) forward: 0.2171 (0.2182) backward: 0.0777 (0.0831) lr: 0.020000 max mem: 3150
INFO: eta: 11:18:20 iter: 820 loss: 1.3302 (1.4839) loss_box_reg: 0.1663 (0.1395) loss_classifier: 0.4627 (0.4532) loss_mask: 0.4746 (0.5873) loss_objectness: 0.1106 (0.1901) loss_rpn_box_reg: 0.0857 (0.1139) time: 0.4915 (0.4564) data: 0.0134 (0.0177) cpu2cuda: 0.1099 (0.1151) forward: 0.2168 (0.2185) backward: 0.0947 (0.0836) lr: 0.020000 max mem: 3150
INFO: eta: 11:18:16 iter: 840 loss: 1.3802 (1.4815) loss_box_reg: 0.1743 (0.1402) loss_classifier: 0.4432 (0.4537) loss_mask: 0.4910 (0.5851) loss_objectness: 0.1341 (0.1889) loss_rpn_box_reg: 0.0609 (0.1136) time: 0.4388 (0.4564) data: 0.0175 (0.0178) cpu2cuda: 0.0897 (0.1147) forward: 0.2041 (0.2187) backward: 0.0834 (0.0839) lr: 0.020000 max mem: 3150
INFO: eta: 11:19:33 iter: 860 loss: 1.4601 (1.4808) loss_box_reg: 0.1751 (0.1413) loss_classifier: 0.5000 (0.4550) loss_mask: 0.4871 (0.5829) loss_objectness: 0.1441 (0.1881) loss_rpn_box_reg: 0.0912 (0.1136) time: 0.4993 (0.4574) data: 0.0150 (0.0178) cpu2cuda: 0.1059 (0.1145) forward: 0.2173 (0.2192) backward: 0.0958 (0.0844) lr: 0.020000 max mem: 3150
INFO: eta: 11:19:36 iter: 880 loss: 1.3573 (1.4778) loss_box_reg: 0.1448 (0.1416) loss_classifier: 0.4136 (0.4546) loss_mask: 0.4688 (0.5801) loss_objectness: 0.1379 (0.1874) loss_rpn_box_reg: 0.0714 (0.1140) time: 0.4696 (0.4575) data: 0.0168 (0.0178) cpu2cuda: 0.1092 (0.1144) forward: 0.2263 (0.2193) backward: 0.0838 (0.0845) lr: 0.020000 max mem: 3150
INFO: eta: 11:19:41 iter: 900 loss: 1.3399 (1.4742) loss_box_reg: 0.1298 (0.1415) loss_classifier: 0.3774 (0.4540) loss_mask: 0.4728 (0.5781) loss_objectness: 0.1435 (0.1865) loss_rpn_box_reg: 0.0996 (0.1141) time: 0.4442 (0.4577) data: 0.0133 (0.0177) cpu2cuda: 0.1196 (0.1144) forward: 0.2141 (0.2196) backward: 0.0782 (0.0846) lr: 0.020000 max mem: 3150
INFO: eta: 11:19:38 iter: 920 loss: 1.2642 (1.4721) loss_box_reg: 0.1208 (0.1414) loss_classifier: 0.3592 (0.4533) loss_mask: 0.4774 (0.5758) loss_objectness: 0.1824 (0.1869) loss_rpn_box_reg: 0.0963 (0.1146) time: 0.4458 (0.4578) data: 0.0177 (0.0178) cpu2cuda: 0.1147 (0.1144) forward: 0.2019 (0.2194) backward: 0.0695 (0.0845) lr: 0.020000 max mem: 3150
INFO: eta: 11:18:38 iter: 940 loss: 1.2608 (1.4677) loss_box_reg: 0.1143 (0.1410) loss_classifier: 0.3606 (0.4519) loss_mask: 0.4795 (0.5738) loss_objectness: 0.1210 (0.1866) loss_rpn_box_reg: 0.0533 (0.1143) time: 0.4236 (0.4572) data: 0.0107 (0.0177) cpu2cuda: 0.1272 (0.1147) forward: 0.1952 (0.2190) backward: 0.0629 (0.0842) lr: 0.020000 max mem: 3150
INFO: eta: 11:19:10 iter: 960 loss: 1.2979 (1.4660) loss_box_reg: 0.1443 (0.1413) loss_classifier: 0.4196 (0.4514) loss_mask: 0.5019 (0.5722) loss_objectness: 0.1670 (0.1869) loss_rpn_box_reg: 0.0812 (0.1142) time: 0.4584 (0.4577) data: 0.0155 (0.0178) cpu2cuda: 0.1085 (0.1145) forward: 0.2284 (0.2192) backward: 0.0941 (0.0845) lr: 0.020000 max mem: 3150
INFO: eta: 11:19:37 iter: 980 loss: 1.4718 (1.4667) loss_box_reg: 0.1443 (0.1417) loss_classifier: 0.4877 (0.4525) loss_mask: 0.5164 (0.5710) loss_objectness: 0.1626 (0.1870) loss_rpn_box_reg: 0.1266 (0.1146) time: 0.4362 (0.4581) data: 0.0144 (0.0177) cpu2cuda: 0.1171 (0.1144) forward: 0.2035 (0.2194) backward: 0.0738 (0.0848) lr: 0.020000 max mem: 3150
INFO: eta: 11:20:22 iter: 1000 loss: 1.4057 (1.4662) loss_box_reg: 0.1492 (0.1421) loss_classifier: 0.4260 (0.4529) loss_mask: 0.4791 (0.5692) loss_objectness: 0.1729 (0.1870) loss_rpn_box_reg: 0.1009 (0.1150) time: 0.4639 (0.4587) data: 0.0128 (0.0177) cpu2cuda: 0.1192 (0.1143) forward: 0.2081 (0.2196) backward: 0.0833 (0.0849) lr: 0.020000 max mem: 3150
INFO: eta: 11:20:31 iter: 1020 loss: 1.2775 (1.4635) loss_box_reg: 0.1344 (0.1421) loss_classifier: 0.3819 (0.4522) loss_mask: 0.4858 (0.5676) loss_objectness: 0.1292 (0.1863) loss_rpn_box_reg: 0.0710 (0.1152) time: 0.4564 (0.4589) data: 0.0145 (0.0177) cpu2cuda: 0.1289 (0.1145) forward: 0.2042 (0.2194) backward: 0.0691 (0.0849) lr: 0.020000 max mem: 3150
INFO: eta: 11:20:30 iter: 1040 loss: 1.1655 (1.4600) loss_box_reg: 0.1561 (0.1424) loss_classifier: 0.4020 (0.4515) loss_mask: 0.4736 (0.5660) loss_objectness: 0.1026 (0.1854) loss_rpn_box_reg: 0.0424 (0.1146) time: 0.4324 (0.4590) data: 0.0142 (0.0177) cpu2cuda: 0.1193 (0.1145) forward: 0.2108 (0.2195) backward: 0.0726 (0.0849) lr: 0.020000 max mem: 3150
INFO: eta: 11:20:49 iter: 1060 loss: 1.2989 (1.4586) loss_box_reg: 0.1455 (0.1427) loss_classifier: 0.4428 (0.4519) loss_mask: 0.5101 (0.5646) loss_objectness: 0.1336 (0.1849) loss_rpn_box_reg: 0.0955 (0.1147) time: 0.4536 (0.4593) data: 0.0144 (0.0177) cpu2cuda: 0.1153 (0.1145) forward: 0.2172 (0.2196) backward: 0.0756 (0.0849) lr: 0.020000 max mem: 3150
INFO: eta: 11:20:51 iter: 1080 loss: 1.2639 (1.4548) loss_box_reg: 0.1201 (0.1425) loss_classifier: 0.3473 (0.4505) loss_mask: 0.4856 (0.5633) loss_objectness: 0.1286 (0.1841) loss_rpn_box_reg: 0.0788 (0.1145) time: 0.4443 (0.4594) data: 0.0156 (0.0177) cpu2cuda: 0.1070 (0.1144) forward: 0.2117 (0.2197) backward: 0.0765 (0.0850) lr: 0.020000 max mem: 3150
INFO: eta: 11:20:46 iter: 1100 loss: 1.1469 (1.4501) loss_box_reg: 0.1215 (0.1424) loss_classifier: 0.3832 (0.4499) loss_mask: 0.4812 (0.5616) loss_objectness: 0.0790 (0.1825) loss_rpn_box_reg: 0.0600 (0.1136) time: 0.4421 (0.4595) data: 0.0132 (0.0176) cpu2cuda: 0.1212 (0.1145) forward: 0.2037 (0.2197) backward: 0.0682 (0.0849) lr: 0.020000 max mem: 3150
INFO: eta: 11:21:28 iter: 1120 loss: 1.4617 (1.4511) loss_box_reg: 0.1761 (0.1431) loss_classifier: 0.5327 (0.4516) loss_mask: 0.4678 (0.5598) loss_objectness: 0.1397 (0.1830) loss_rpn_box_reg: 0.0723 (0.1136) time: 0.5088 (0.4600) data: 0.0164 (0.0177) cpu2cuda: 0.1160 (0.1145) forward: 0.2219 (0.2200) backward: 0.0810 (0.0850) lr: 0.020000 max mem: 3150
INFO: eta: 11:21:51 iter: 1140 loss: 1.3548 (1.4514) loss_box_reg: 0.1438 (0.1434) loss_classifier: 0.4288 (0.4518) loss_mask: 0.4968 (0.5587) loss_objectness: 0.1977 (0.1836) loss_rpn_box_reg: 0.0669 (0.1139) time: 0.4698 (0.4604) data: 0.0176 (0.0177) cpu2cuda: 0.1157 (0.1144) forward: 0.2209 (0.2202) backward: 0.0814 (0.0853) lr: 0.020000 max mem: 3273
INFO: eta: 11:22:26 iter: 1160 loss: 1.3427 (1.4511) loss_box_reg: 0.1638 (0.1441) loss_classifier: 0.4600 (0.4529) loss_mask: 0.4664 (0.5572) loss_objectness: 0.1262 (0.1832) loss_rpn_box_reg: 0.0936 (0.1138) time: 0.4809 (0.4609) data: 0.0137 (0.0177) cpu2cuda: 0.1062 (0.1142) forward: 0.2181 (0.2206) backward: 0.0942 (0.0855) lr: 0.020000 max mem: 3273
INFO: eta: 11:22:50 iter: 1180 loss: 1.3135 (1.4501) loss_box_reg: 0.1386 (0.1441) loss_classifier: 0.4650 (0.4528) loss_mask: 0.4468 (0.5553) loss_objectness: 0.1685 (0.1835) loss_rpn_box_reg: 0.0692 (0.1143) time: 0.4649 (0.4613) data: 0.0138 (0.0177) cpu2cuda: 0.1181 (0.1142) forward: 0.2101 (0.2208) backward: 0.0678 (0.0856) lr: 0.020000 max mem: 3273
INFO: eta: 11:23:10 iter: 1200 loss: 1.2336 (1.4477) loss_box_reg: 0.1621 (0.1444) loss_classifier: 0.4282 (0.4529) loss_mask: 0.4587 (0.5539) loss_objectness: 0.1211 (0.1826) loss_rpn_box_reg: 0.0661 (0.1139) time: 0.4582 (0.4616) data: 0.0152 (0.0177) cpu2cuda: 0.1007 (0.1141) forward: 0.2152 (0.2210) backward: 0.0837 (0.0857) lr: 0.020000 max mem: 3273
INFO: eta: 11:23:02 iter: 1220 loss: 1.2236 (1.4450) loss_box_reg: 0.1330 (0.1442) loss_classifier: 0.4394 (0.4526) loss_mask: 0.4318 (0.5519) loss_objectness: 0.1276 (0.1822) loss_rpn_box_reg: 0.0969 (0.1142) time: 0.4600 (0.4616) data: 0.0140 (0.0177) cpu2cuda: 0.1153 (0.1141) forward: 0.2002 (0.2209) backward: 0.0702 (0.0856) lr: 0.020000 max mem: 3273
INFO: eta: 11:23:40 iter: 1240 loss: 1.1518 (1.4422) loss_box_reg: 0.1457 (0.1445) loss_classifier: 0.3850 (0.4525) loss_mask: 0.4426 (0.5502) loss_objectness: 0.1243 (0.1813) loss_rpn_box_reg: 0.0642 (0.1138) time: 0.4834 (0.4622) data: 0.0200 (0.0178) cpu2cuda: 0.1100 (0.1141) forward: 0.2294 (0.2213) backward: 0.0821 (0.0858) lr: 0.020000 max mem: 3273
INFO: eta: 11:24:01 iter: 1260 loss: 1.2766 (1.4402) loss_box_reg: 0.1349 (0.1446) loss_classifier: 0.4305 (0.4528) loss_mask: 0.4346 (0.5487) loss_objectness: 0.1248 (0.1806) loss_rpn_box_reg: 0.0866 (0.1135) time: 0.5034 (0.4625) data: 0.0151 (0.0178) cpu2cuda: 0.1135 (0.1141) forward: 0.2074 (0.2212) backward: 0.0744 (0.0859) lr: 0.020000 max mem: 3273
INFO: eta: 11:24:09 iter: 1280 loss: 1.2564 (1.4391) loss_box_reg: 0.1322 (0.1443) loss_classifier: 0.3514 (0.4517) loss_mask: 0.4578 (0.5475) loss_objectness: 0.1377 (0.1821) loss_rpn_box_reg: 0.0860 (0.1134) time: 0.4556 (0.4627) data: 0.0131 (0.0178) cpu2cuda: 0.1150 (0.1141) forward: 0.2050 (0.2211) backward: 0.0666 (0.0859) lr: 0.020000 max mem: 3273
INFO: eta: 11:24:00 iter: 1300 loss: 1.3876 (1.4396) loss_box_reg: 0.1307 (0.1442) loss_classifier: 0.4316 (0.4515) loss_mask: 0.4962 (0.5466) loss_objectness: 0.2181 (0.1836) loss_rpn_box_reg: 0.1055 (0.1137) time: 0.4579 (0.4627) data: 0.0133 (0.0178) cpu2cuda: 0.1253 (0.1143) forward: 0.2055 (0.2210) backward: 0.0739 (0.0857) lr: 0.020000 max mem: 3273
INFO: eta: 11:24:31 iter: 1320 loss: 1.4171 (1.4392) loss_box_reg: 0.1705 (0.1445) loss_classifier: 0.4820 (0.4518) loss_mask: 0.4673 (0.5454) loss_objectness: 0.1533 (0.1836) loss_rpn_box_reg: 0.0901 (0.1139) time: 0.4649 (0.4631) data: 0.0161 (0.0178) cpu2cuda: 0.1141 (0.1142) forward: 0.2058 (0.2213) backward: 0.0842 (0.0859) lr: 0.020000 max mem: 3273
INFO: eta: 11:24:10 iter: 1340 loss: 1.2669 (1.4367) loss_box_reg: 0.1360 (0.1443) loss_classifier: 0.4076 (0.4511) loss_mask: 0.4557 (0.5442) loss_objectness: 0.1463 (0.1832) loss_rpn_box_reg: 0.0800 (0.1140) time: 0.4322 (0.4630) data: 0.0148 (0.0178) cpu2cuda: 0.1205 (0.1142) forward: 0.2119 (0.2213) backward: 0.0804 (0.0860) lr: 0.020000 max mem: 3273
INFO: eta: 11:24:12 iter: 1360 loss: 1.4154 (1.4355) loss_box_reg: 0.1476 (0.1444) loss_classifier: 0.4398 (0.4512) loss_mask: 0.4522 (0.5430) loss_objectness: 0.1275 (0.1826) loss_rpn_box_reg: 0.0926 (0.1143) time: 0.4675 (0.4631) data: 0.0144 (0.0178) cpu2cuda: 0.1142 (0.1141) forward: 0.2017 (0.2215) backward: 0.0797 (0.0860) lr: 0.020000 max mem: 3273
INFO: eta: 11:24:07 iter: 1380 loss: 1.2346 (1.4354) loss_box_reg: 0.1460 (0.1446) loss_classifier: 0.4672 (0.4513) loss_mask: 0.4896 (0.5423) loss_objectness: 0.1470 (0.1828) loss_rpn_box_reg: 0.0816 (0.1144) time: 0.4637 (0.4632) data: 0.0211 (0.0179) cpu2cuda: 0.0922 (0.1139) forward: 0.2307 (0.2216) backward: 0.0882 (0.0862) lr: 0.020000 max mem: 3273
INFO: eta: 11:24:32 iter: 1400 loss: 1.2104 (1.4343) loss_box_reg: 0.1273 (0.1447) loss_classifier: 0.4121 (0.4516) loss_mask: 0.4433 (0.5410) loss_objectness: 0.1626 (0.1823) loss_rpn_box_reg: 0.1023 (0.1147) time: 0.4961 (0.4636) data: 0.0140 (0.0179) cpu2cuda: 0.1189 (0.1139) forward: 0.2141 (0.2217) backward: 0.0791 (0.0863) lr: 0.020000 max mem: 3273
INFO: eta: 11:25:34 iter: 1420 loss: 1.3347 (1.4339) loss_box_reg: 0.1865 (0.1452) loss_classifier: 0.5047 (0.4522) loss_mask: 0.4837 (0.5399) loss_objectness: 0.0887 (0.1819) loss_rpn_box_reg: 0.0901 (0.1147) time: 0.5229 (0.4644) data: 0.0149 (0.0179) cpu2cuda: 0.1155 (0.1139) forward: 0.2449 (0.2222) backward: 0.0855 (0.0864) lr: 0.020000 max mem: 3273
INFO: eta: 11:25:56 iter: 1440 loss: 1.3153 (1.4326) loss_box_reg: 0.1629 (0.1455) loss_classifier: 0.4455 (0.4523) loss_mask: 0.4635 (0.5388) loss_objectness: 0.1359 (0.1815) loss_rpn_box_reg: 0.0972 (0.1145) time: 0.4920 (0.4647) data: 0.0163 (0.0179) cpu2cuda: 0.1200 (0.1138) forward: 0.2206 (0.2223) backward: 0.0826 (0.0866) lr: 0.020000 max mem: 3273
INFO: eta: 11:25:47 iter: 1460 loss: 1.2140 (1.4312) loss_box_reg: 0.1361 (0.1456) loss_classifier: 0.4297 (0.4523) loss_mask: 0.4435 (0.5377) loss_objectness: 0.1296 (0.1812) loss_rpn_box_reg: 0.0594 (0.1143) time: 0.4533 (0.4647) data: 0.0170 (0.0179) cpu2cuda: 0.1152 (0.1139) forward: 0.2163 (0.2225) backward: 0.0715 (0.0864) lr: 0.020000 max mem: 3273
INFO: eta: 11:26:06 iter: 1480 loss: 1.3437 (1.4296) loss_box_reg: 0.1438 (0.1457) loss_classifier: 0.4536 (0.4523) loss_mask: 0.4508 (0.5367) loss_objectness: 0.0970 (0.1808) loss_rpn_box_reg: 0.0583 (0.1142) time: 0.4664 (0.4651) data: 0.0153 (0.0179) cpu2cuda: 0.1187 (0.1138) forward: 0.2284 (0.2228) backward: 0.0781 (0.0866) lr: 0.020000 max mem: 3273
INFO: eta: 11:26:02 iter: 1500 loss: 1.2773 (1.4286) loss_box_reg: 0.1214 (0.1458) loss_classifier: 0.4077 (0.4524) loss_mask: 0.4667 (0.5358) loss_objectness: 0.1209 (0.1804) loss_rpn_box_reg: 0.0908 (0.1142) time: 0.4579 (0.4651) data: 0.0131 (0.0179) cpu2cuda: 0.1286 (0.1140) forward: 0.2033 (0.2226) backward: 0.0657 (0.0864) lr: 0.020000 max mem: 3273
INFO: eta: 11:26:05 iter: 1520 loss: 1.3348 (1.4270) loss_box_reg: 0.1425 (0.1460) loss_classifier: 0.4405 (0.4524) loss_mask: 0.4363 (0.5346) loss_objectness: 0.1139 (0.1799) loss_rpn_box_reg: 0.0849 (0.1141) time: 0.4378 (0.4653) data: 0.0135 (0.0179) cpu2cuda: 0.0996 (0.1139) forward: 0.2083 (0.2227) backward: 0.0900 (0.0866) lr: 0.020000 max mem: 3273
INFO: eta: 11:25:59 iter: 1540 loss: 1.2242 (1.4244) loss_box_reg: 0.1164 (0.1457) loss_classifier: 0.3341 (0.4513) loss_mask: 0.4476 (0.5334) loss_objectness: 0.1547 (0.1797) loss_rpn_box_reg: 0.0962 (0.1143) time: 0.4524 (0.4653) data: 0.0150 (0.0179) cpu2cuda: 0.1201 (0.1139) forward: 0.2106 (0.2227) backward: 0.0721 (0.0866) lr: 0.020000 max mem: 3273
INFO: eta: 11:26:32 iter: 1560 loss: 1.3610 (1.4243) loss_box_reg: 0.1626 (0.1461) loss_classifier: 0.4904 (0.4524) loss_mask: 0.4439 (0.5324) loss_objectness: 0.1352 (0.1792) loss_rpn_box_reg: 0.0989 (0.1141) time: 0.5037 (0.4658) data: 0.0138 (0.0179) cpu2cuda: 0.1173 (0.1139) forward: 0.2162 (0.2229) backward: 0.0907 (0.0868) lr: 0.020000 max mem: 3273
INFO: eta: 11:26:43 iter: 1580 loss: 1.3862 (1.4230) loss_box_reg: 0.1435 (0.1462) loss_classifier: 0.4196 (0.4523) loss_mask: 0.4290 (0.5312) loss_objectness: 0.1650 (0.1791) loss_rpn_box_reg: 0.0776 (0.1142) time: 0.4742 (0.4660) data: 0.0170 (0.0179) cpu2cuda: 0.1283 (0.1140) forward: 0.2213 (0.2230) backward: 0.0790 (0.0868) lr: 0.020000 max mem: 3273
INFO: eta: 11:27:19 iter: 1600 loss: 1.2703 (1.4213) loss_box_reg: 0.1463 (0.1463) loss_classifier: 0.4256 (0.4521) loss_mask: 0.4321 (0.5300) loss_objectness: 0.1267 (0.1788) loss_rpn_box_reg: 0.1001 (0.1142) time: 0.4736 (0.4665) data: 0.0137 (0.0179) cpu2cuda: 0.1188 (0.1139) forward: 0.2191 (0.2231) backward: 0.0753 (0.0870) lr: 0.020000 max mem: 3273
INFO: eta: 11:27:46 iter: 1620 loss: 1.2103 (1.4194) loss_box_reg: 0.1495 (0.1464) loss_classifier: 0.4146 (0.4521) loss_mask: 0.4290 (0.5289) loss_objectness: 0.1248 (0.1781) loss_rpn_box_reg: 0.0973 (0.1140) time: 0.4593 (0.4669) data: 0.0129 (0.0179) cpu2cuda: 0.1048 (0.1139) forward: 0.2447 (0.2236) backward: 0.0796 (0.0871) lr: 0.020000 max mem: 3273
INFO: eta: 11:27:49 iter: 1640 loss: 1.1808 (1.4179) loss_box_reg: 0.1302 (0.1465) loss_classifier: 0.3935 (0.4517) loss_mask: 0.4449 (0.5280) loss_objectness: 0.1249 (0.1776) loss_rpn_box_reg: 0.0955 (0.1141) time: 0.4538 (0.4671) data: 0.0185 (0.0179) cpu2cuda: 0.1081 (0.1136) forward: 0.2301 (0.2238) backward: 0.0819 (0.0872) lr: 0.020000 max mem: 3273
INFO: eta: 11:27:48 iter: 1660 loss: 1.0554 (1.4155) loss_box_reg: 0.1203 (0.1464) loss_classifier: 0.3368 (0.4510) loss_mask: 0.4296 (0.5271) loss_objectness: 0.1124 (0.1771) loss_rpn_box_reg: 0.0874 (0.1139) time: 0.4631 (0.4672) data: 0.0145 (0.0180) cpu2cuda: 0.1190 (0.1136) forward: 0.2103 (0.2238) backward: 0.0777 (0.0872) lr: 0.020000 max mem: 3273
INFO: eta: 11:28:14 iter: 1680 loss: 1.1996 (1.4132) loss_box_reg: 0.1500 (0.1467) loss_classifier: 0.4520 (0.4512) loss_mask: 0.4379 (0.5260) loss_objectness: 0.0933 (0.1762) loss_rpn_box_reg: 0.0581 (0.1133) time: 0.4892 (0.4676) data: 0.0174 (0.0180) cpu2cuda: 0.1146 (0.1136) forward: 0.2311 (0.2240) backward: 0.0849 (0.0873) lr: 0.020000 max mem: 3273
The gpu memory consumption given by nvidia-smi
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 384.145 Driver Version: 384.145 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla P100-PCIE... On | 00000000:05:00.0 Off | 0 |
| N/A 49C P0 139W / 250W | 5925MiB / 16276MiB | 57% Default |
+-------------------------------+----------------------+----------------------+
| 1 Tesla P100-PCIE... On | 00000000:87:00.0 Off | 0 |
| N/A 51C P0 144W / 250W | 6611MiB / 16276MiB | 90% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 94173 C .../anaconda3/bin/python 5915MiB |
| 1 94174 C .../anaconda3/bin/python 6601MiB |
+-----------------------------------------------------------------------------+
Issue Analytics
- State:
- Created 4 years ago
- Reactions:1
- Comments:12 (11 by maintainers)
Top GitHub Comments
Finally, it took me more than 2 days to finish the training of Keypoints R-CNN with 2 P100 GPUs, and I got the similar performance on
coco_minival2014
dataset as reported in MODEL_ZOO.mdHi @fmassa, I’d appreciate it if you can tell me why the training time and speed between mine and yours differ much but the inference time is very similar (mine is even faster than yours). Is there a possible reason in my situation that the communication between two GPUs (i.e.,
nccl
) doesn’t work well? Looking forward to your reply, thank you very much!@sarahmass Yes, it did.