question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Runtime Error when resuming trained model

See original GitHub issue

Hello, I have trained a model, when I want to resume it in a bigger dataset, I encounter this problem:

loading checkpoint ./trained_models/vgg16/pascal_voc/faster_rcnn_1_1_41.pth
loaded checkpoint ./trained_models/vgg16/pascal_voc/faster_rcnn_1_1_41.pth
/home/shin/faster-rcnn.pytorch/lib/model/rpn/rpn.py:68: UserWarning: Implicit dimension choice for softmax has been deprecated. Change the call to include dim=X as an argument.
  rpn_cls_prob_reshape = F.softmax(rpn_cls_score_reshape)
/home/shin/faster-rcnn.pytorch/lib/model/faster_rcnn/faster_rcnn.py:98: UserWarning: Implicit dimension choice for softmax has been deprecated. Change the call to include dim=X as an argument.
  cls_prob = F.softmax(cls_score)
Traceback (most recent call last):
  File "trainval_net.py", line 335, in <module>
    optimizer.step()
  File "/usr/local/lib/python3.5/dist-packages/torch/optim/sgd.py", line 94, in step
    buf.mul_(momentum).add_(1 - dampening, d_p)
RuntimeError: invalid argument 3: sizes do not match at /pytorch/torch/lib/THC/generated/../generic/THCTensorMathPointwise.cu:271

The training parameters are same. In fact, I train a model for 1 epoch and then resume it, this issue also happened…

Issue Analytics

  • State:closed
  • Created 6 years ago
  • Reactions:5
  • Comments:9 (2 by maintainers)

github_iconTop GitHub Comments

1reaction
xwjBuptcommented, Sep 7, 2018

@Liu0329 @shinshiner hi,guys,did you fix this problem? i also encountered this problem when i want to use the pretrained model faster_rcnn_1_7_10021.pth on my own dataset,i have tried to comment these two lines

if args.mGPUs:

fasterRCNN = nn.DataParallel(fasterRCNN)

but it did no work, what should i do?Thank you !!!

1reaction
jwyangcommented, Feb 6, 2018

@shinshiner great!

Read more comments on GitHub >

github_iconTop Results From Across the Web

Resuming pytorch model training raises error “CUDA out of ...
After I trained my model for 1 epoch I interrupted the process via terminal with CTRL+Z. When I tried to resume the training...
Read more >
Troubleshooting Some Issues in PyTorch While Resuming Model ...
Having troubles while trying to resume training your model? Me too! ... Resuming Training throws a 'RunTimeError' saying that the data types do...
Read more >
How To Fix Runtime Error On Windows 10/11 [Tutorial]
How To Fix Runtime Error On Windows 10/11 [Tutorial]A runtime error occurs while a program is running or when you first attempt to...
Read more >
What to do when you get an error - Hugging Face Course
In this section we'll look at some common errors that can occur when you're trying to generate predictions from your freshly tuned Transformer...
Read more >
Train a model — MMSegmentation 0.29.1 documentation
resume-from loads both the model weights and optimizer state including the iteration ... Otherwise, there will be error message saying RuntimeError: Address ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found