strange runtime error: dimension specified as 0 but tensor has no dimensions
See original GitHub issueI have 4 GPU on my machine, running training with
--dataset pascal_voc --net res101 --bs 8 --nw 4 --lr 4e-3 --lr_decay_step 8 --cuda --mGPUs
but get error:
Loading pretrained weights from data/pretrained_model/resnet101_caffe.pth
/home/user/anaconda2/envs/tensorflow/lib/python2.7/site-packages/torch/nn/parallel/data_parallel.py:24: UserWarning:
There is an imbalance between your GPUs. You may want to exclude GPU 0 which
has less than 75% of the memory or cores of GPU 1. You can do so by setting
the device_ids argument to DataParallel, or by setting the CUDA_VISIBLE_DEVICES
environment variable.
warnings.warn(imbalance_warn.format(device_ids[min_pos], device_ids[max_pos]))
/home/user/prj/pytorch-faster-rcnn/lib/model/rpn/rpn.py:68: UserWarning: Implicit dimension choice for softmax has been deprecated. Change the call to include dim=X as an argument.
rpn_cls_prob_reshape = F.softmax(rpn_cls_score_reshape)
/home/user/prj/pytorch-faster-rcnn/lib/model/faster_rcnn/faster_rcnn.py:98: UserWarning: Implicit dimension choice for softmax has been deprecated. Change the call to include dim=X as an argument.
cls_prob = F.softmax(cls_score)
Traceback (most recent call last):
File "/home/user/.pycharm_helpers/pydev/pydevd.py", line 1664, in <module>
main()
File "/home/user/.pycharm_helpers/pydev/pydevd.py", line 1658, in main
globals = debugger.run(setup['file'], None, None, is_module)
File "/home/user/.pycharm_helpers/pydev/pydevd.py", line 1068, in run
pydev_imports.execfile(file, globals, locals) # execute the script
File "/home/user/prj/pytorch-faster-rcnn/trainval_net.py", line 323, in <module>
rois_label = fasterRCNN(im_data, im_info, gt_boxes, num_boxes)
File "/home/user/anaconda2/envs/tensorflow/lib/python2.7/site-packages/torch/nn/modules/module.py", line 491, in __call__
result = self.forward(*input, **kwargs)
File "/home/user/anaconda2/envs/tensorflow/lib/python2.7/site-packages/torch/nn/parallel/data_parallel.py", line 115, in forward
return self.gather(outputs, self.output_device)
File "/home/user/anaconda2/envs/tensorflow/lib/python2.7/site-packages/torch/nn/parallel/data_parallel.py", line 127, in gather
return gather(outputs, output_device, dim=self.dim)
File "/home/user/anaconda2/envs/tensorflow/lib/python2.7/site-packages/torch/nn/parallel/scatter_gather.py", line 68, in gather
return gather_map(outputs)
File "/home/user/anaconda2/envs/tensorflow/lib/python2.7/site-packages/torch/nn/parallel/scatter_gather.py", line 63, in gather_map
return type(out)(map(gather_map, zip(*outputs)))
File "/home/user/anaconda2/envs/tensorflow/lib/python2.7/site-packages/torch/nn/parallel/scatter_gather.py", line 55, in gather_map
return Gather.apply(target_device, dim, *outputs)
File "/home/user/anaconda2/envs/tensorflow/lib/python2.7/site-packages/torch/nn/parallel/_functions.py", line 54, in forward
ctx.input_sizes = tuple(map(lambda i: i.size(ctx.dim), inputs))
File "/home/user/anaconda2/envs/tensorflow/lib/python2.7/site-packages/torch/nn/parallel/_functions.py", line 54, in <lambda>
ctx.input_sizes = tuple(map(lambda i: i.size(ctx.dim), inputs))
RuntimeError: dimension specified as 0 but tensor has no dimensions
Issue Analytics
- State:
- Created 5 years ago
- Comments:5
Top Results From Across the Web
RuntimeError: dimension specified as 0 but tensor has no ...
The source of this error is that you, probably by mistake, run for image, labels in train_data instead of for image, labels in...
Read more >RuntimeError: dimension specified as 0 but tensor ... - GitHub
My code run well under torch 0.2, however, the following error occurs when I use torch 0.3.1, why? Traceback (most recent call last):...
Read more >Dimension specified as 0 but tensor has no ... - PyTorch Forums
The doc explains, that input should have dimensions [minibatch, C] , where C is the number of classes. In you example you seem...
Read more >I was having an strange error while testing your code with ...
RuntimeError : dimension specified as 0 but tensor has no dimensions. I added verification of the size of the target values and, ...
Read more >Trainer RuntimeError: The size of tensor a (462) must match ...
Hi, I am finetuning Whisper and run into a trainer issue and don't know what to do: RuntimeError: The size of tensor a...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
@wtl-zju thank you. Works. using python3 with pytorch 0.4 in virtualenv.
Slight error in @wtl-zju.
To clarify, add these lines just before returning the values in lib/model/faster_rcnn/faster_rcnn.py
if self.training: rpn_loss_cls = torch.unsqueeze(rpn_loss_cls, 0) rpn_loss_bbox = torch.unsqueeze(rpn_loss_bbox, 0) RCNN_loss_cls = torch.unsqueeze(RCNN_loss_cls, 0) RCNN_loss_bbox = torch.unsqueeze(RCNN_loss_bbox, 0)
it is placed in the self.training as it shouldn’t be training these when testing / predicting. Additionally, the variable is set to 0 which can be seen a few lines above the code.
I just fixed this problem by unsqueezing RCNN_loss_cls, RCNN_loss_bbox, rpn_loss_cls, rpn_loss_cls in lib/model/faster_rcnn/faster_rcnn.py. Basically, scalar tensor in Pytorch 0.4 caused the error so you need to add one more dimension: rpn_loss_cls = torch.unsqueeze(rpn_loss_cls, 0) … BTW I compiled Pytorch 0.4 from the source but I think it should also work if you install from conda.