resume the training for pytorch 1.0
See original GitHub issueWhen I use the pytorch1.0 branch, it can train for pascal VOC dataset. But when I break the training and resume from the previous model, I got the RuntimeError. Did anyone have this problem? The error are described as followed:
Loaded dataset voc_2007_trainval
for training
Set proposal method: gt
Appending horizontally-flipped training examples…
voc_2007_trainval gt roidb loaded from /home/user02/notebook/faster-rcnn.pytorch-pytorch-1.0/data/cache/voc_2007_trainval_gt_roidb.pkl
done
Preparing training data…
done
before filtering, there are 10022 images…
after filtering, there are 10022 images…
10022 roidb entries
Loading pretrained weights from data/pretrained_model/vgg16_caffe.pth
loading checkpoint models/vgg16/pascal_voc/faster_rcnn_1_3_10021.pth
loaded checkpoint models/vgg16/pascal_voc/faster_rcnn_1_3_10021.pth
Traceback (most recent call last):
File “trainval_net.py”, line 355, in <module>
optimizer.step()
File “/usr/local/lib/python3.5/dist-packages/torch/optim/sgd.py”, line 101, in step
buf.mul_(momentum).add_(1 - dampening, d_p)
RuntimeError: The size of tensor a (512) must match the size of tensor b (18) at non-singleton dimension 0
Issue Analytics
- State:
- Created 4 years ago
- Comments:5
Top GitHub Comments
This might be a stretch, but you could try this: https://github.com/jwyang/faster-rcnn.pytorch/issues/475#issuecomment-483243293
Great, happy to see it worked for you!
I’m using the PyTorch-1.0 branch as well, there is an inconsistency when using the listed trained models, as it appears there was a slight change in PyTorch-1.0 compared to PyTorch-0.4.0, not sure what exactly, but I believe it is listed in one of the Issues in this repo. When training the model myself the results are rather consistent, but there is a slight difference in performance (0.5~2.0 mAP) for me.