Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

strange runtime error: dimension specified as 0 but tensor has no dimensions

See original GitHub issue

I have 4 GPU on my machine, running training with --dataset pascal_voc --net res101 --bs 8 --nw 4 --lr 4e-3 --lr_decay_step 8 --cuda --mGPUs but get error:

Loading pretrained weights from data/pretrained_model/resnet101_caffe.pth
/home/user/anaconda2/envs/tensorflow/lib/python2.7/site-packages/torch/nn/parallel/data_parallel.py:24: UserWarning: 
    There is an imbalance between your GPUs. You may want to exclude GPU 0 which
    has less than 75% of the memory or cores of GPU 1. You can do so by setting
    the device_ids argument to DataParallel, or by setting the CUDA_VISIBLE_DEVICES
    environment variable.
  warnings.warn(imbalance_warn.format(device_ids[min_pos], device_ids[max_pos]))
/home/user/prj/pytorch-faster-rcnn/lib/model/rpn/rpn.py:68: UserWarning: Implicit dimension choice for softmax has been deprecated. Change the call to include dim=X as an argument.
  rpn_cls_prob_reshape = F.softmax(rpn_cls_score_reshape)
/home/user/prj/pytorch-faster-rcnn/lib/model/faster_rcnn/faster_rcnn.py:98: UserWarning: Implicit dimension choice for softmax has been deprecated. Change the call to include dim=X as an argument.
  cls_prob = F.softmax(cls_score)
Traceback (most recent call last):
  File "/home/user/.pycharm_helpers/pydev/pydevd.py", line 1664, in <module>
    main()
  File "/home/user/.pycharm_helpers/pydev/pydevd.py", line 1658, in main
    globals = debugger.run(setup['file'], None, None, is_module)
  File "/home/user/.pycharm_helpers/pydev/pydevd.py", line 1068, in run
    pydev_imports.execfile(file, globals, locals)  # execute the script
  File "/home/user/prj/pytorch-faster-rcnn/trainval_net.py", line 323, in <module>
    rois_label = fasterRCNN(im_data, im_info, gt_boxes, num_boxes)
  File "/home/user/anaconda2/envs/tensorflow/lib/python2.7/site-packages/torch/nn/modules/module.py", line 491, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/user/anaconda2/envs/tensorflow/lib/python2.7/site-packages/torch/nn/parallel/data_parallel.py", line 115, in forward
    return self.gather(outputs, self.output_device)
  File "/home/user/anaconda2/envs/tensorflow/lib/python2.7/site-packages/torch/nn/parallel/data_parallel.py", line 127, in gather
    return gather(outputs, output_device, dim=self.dim)
  File "/home/user/anaconda2/envs/tensorflow/lib/python2.7/site-packages/torch/nn/parallel/scatter_gather.py", line 68, in gather
    return gather_map(outputs)
  File "/home/user/anaconda2/envs/tensorflow/lib/python2.7/site-packages/torch/nn/parallel/scatter_gather.py", line 63, in gather_map
    return type(out)(map(gather_map, zip(*outputs)))
  File "/home/user/anaconda2/envs/tensorflow/lib/python2.7/site-packages/torch/nn/parallel/scatter_gather.py", line 55, in gather_map
    return Gather.apply(target_device, dim, *outputs)
  File "/home/user/anaconda2/envs/tensorflow/lib/python2.7/site-packages/torch/nn/parallel/_functions.py", line 54, in forward
    ctx.input_sizes = tuple(map(lambda i: i.size(ctx.dim), inputs))
  File "/home/user/anaconda2/envs/tensorflow/lib/python2.7/site-packages/torch/nn/parallel/_functions.py", line 54, in <lambda>
    ctx.input_sizes = tuple(map(lambda i: i.size(ctx.dim), inputs))
RuntimeError: dimension specified as 0 but tensor has no dimensions

Issue Analytics

State:
Created 5 years ago
Comments:5

Top GitHub Comments

37reactions

ljtruongcommented, Jul 4, 2018

@wtl-zju thank you. Works. using python3 with pytorch 0.4 in virtualenv.

Slight error in @wtl-zju.

To clarify, add these lines just before returning the values in lib/model/faster_rcnn/faster_rcnn.py

if self.training: rpn_loss_cls = torch.unsqueeze(rpn_loss_cls, 0) rpn_loss_bbox = torch.unsqueeze(rpn_loss_bbox, 0) RCNN_loss_cls = torch.unsqueeze(RCNN_loss_cls, 0) RCNN_loss_bbox = torch.unsqueeze(RCNN_loss_bbox, 0)

it is placed in the self.training as it shouldn’t be training these when testing / predicting. Additionally, the variable is set to 0 which can be seen a few lines above the code.

9reactions

tianlu-wangcommented, Jul 4, 2018

I just fixed this problem by unsqueezing RCNN_loss_cls, RCNN_loss_bbox, rpn_loss_cls, rpn_loss_cls in lib/model/faster_rcnn/faster_rcnn.py. Basically, scalar tensor in Pytorch 0.4 caused the error so you need to add one more dimension: rpn_loss_cls = torch.unsqueeze(rpn_loss_cls, 0) … BTW I compiled Pytorch 0.4 from the source but I think it should also work if you install from conda.