Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Runtime Error when resuming trained model

See original GitHub issue

Hello, I have trained a model, when I want to resume it in a bigger dataset, I encounter this problem:

loading checkpoint ./trained_models/vgg16/pascal_voc/faster_rcnn_1_1_41.pth
loaded checkpoint ./trained_models/vgg16/pascal_voc/faster_rcnn_1_1_41.pth
/home/shin/faster-rcnn.pytorch/lib/model/rpn/rpn.py:68: UserWarning: Implicit dimension choice for softmax has been deprecated. Change the call to include dim=X as an argument.
  rpn_cls_prob_reshape = F.softmax(rpn_cls_score_reshape)
/home/shin/faster-rcnn.pytorch/lib/model/faster_rcnn/faster_rcnn.py:98: UserWarning: Implicit dimension choice for softmax has been deprecated. Change the call to include dim=X as an argument.
  cls_prob = F.softmax(cls_score)
Traceback (most recent call last):
  File "trainval_net.py", line 335, in <module>
    optimizer.step()
  File "/usr/local/lib/python3.5/dist-packages/torch/optim/sgd.py", line 94, in step
    buf.mul_(momentum).add_(1 - dampening, d_p)
RuntimeError: invalid argument 3: sizes do not match at /pytorch/torch/lib/THC/generated/../generic/THCTensorMathPointwise.cu:271

The training parameters are same. In fact, I train a model for 1 epoch and then resume it, this issue also happened…

Issue Analytics

State:
Created 6 years ago
Reactions:5
Comments:9 (2 by maintainers)

Top GitHub Comments

1reaction

xwjBuptcommented, Sep 7, 2018

@Liu0329 @shinshiner hi,guys,did you fix this problem? i also encountered this problem when i want to use the pretrained model faster_rcnn_1_7_10021.pth on my own dataset,i have tried to comment these two lines

if args.mGPUs:

fasterRCNN = nn.DataParallel(fasterRCNN)

but it did no work, what should i do?Thank you !!!

1reaction

jwyangcommented, Feb 6, 2018

@shinshiner great!

Top Results From Across the Web

Resuming pytorch model training raises error “CUDA out of ...

After I trained my model for 1 epoch I interrupted the process via terminal with CTRL+Z. When I tried to resume the training...

Troubleshooting Some Issues in PyTorch While Resuming Model ...

Having troubles while trying to resume training your model? Me too! ... Resuming Training throws a 'RunTimeError' saying that the data types do...

How To Fix Runtime Error On Windows 10/11 [Tutorial]

How To Fix Runtime Error On Windows 10/11 [Tutorial]A runtime error occurs while a program is running or when you first attempt to...

What to do when you get an error - Hugging Face Course

In this section we'll look at some common errors that can occur when you're trying to generate predictions from your freshly tuned Transformer...

Train a model — MMSegmentation 0.29.1 documentation

resume-from loads both the model weights and optimizer state including the iteration ... Otherwise, there will be error message saying RuntimeError: Address ......