Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[BUG]: Inference error: 'Sizes of tensors must match except in dimension 1'

See original GitHub issue

Hi,

I fine-tuned XDXD’s model on my own dataset (configured for training with pretrained weights set to true) and am now trying to infer on an image that was also in the training set just to see how it’ll come out. However this error is thrown upon doing so:

Traceback (most recent call last): File “c:/Users/blue/Documents/solaris/xdxd_inference.py”, line 9, in <module> inferer(inference_data) File “c:\Users\blue\Documents\solaris\solaris\nets\infer.py”, line 87, in call subarr_preds = self.model(inf_input) File “C:\Users\blue\Miniconda3\envs\solaris\lib\site-packages\torch\nn\modules\module.py”, line 493, in call result = self.forward(*input, **kwargs) File “c:\Users\blue\Documents\solaris\solaris\nets\zoo\xdxd_sn4.py”, line 48, in forward dec5 = self.dec5(torch.cat([center, conv5], 1)) RuntimeError: invalid argument 0: Sizes of tensors must match except in dimension 1. Got 31 and 30 in dimension 2 at C:/w/1/s/tmp_conda_3.6_035809/conda/conda-bld/pytorch_1556683229598/work/aten/src\THC/generic/THCTensorMath.cu:71

Oddly, inference is successful on images which were not in the training dataset, and which have a different size. Therefore this doesn’t strike me as a bug but rather as an oversight regarding settings on my part. Any idea why the error could be thrown? Both training and test images are (500, 500, 3) and are normalised the same way. Random crops are taken from the training images (320, 320) during augmentation but when inferring from non-training data (480, 480, 3), this is not problematic. The only differences in the configuration file are train, infer and pretrained when used for training and testing.

yaaml.zip (this behaviour also occurs when windows_step_size is not set, and when set to 500)

Issue Analytics

State:
Created 4 years ago
Comments:12 (7 by maintainers)

Top GitHub Comments

1reaction

nrweircommented, Sep 12, 2019

OK, sounds like we’re narrowing down. That was our experience on debugging also (error in inference, not train). I wonder if this would always be the case during training, or if it’s something specific to the augmentation pipeline used. If either of you are training models anytime soon and would like to try the following, it would help:

Do a training pilot run of one epoch with your normal augmentations and an “incompatible” shape (e.g. 500x500 for XD_XD’s model) and verify it works;
Do a training pilot run of one epoch with no augmentations and an “incompatible” shape and see if it works;
If 2. doesn’t work, do an epoch of training with only the following augmentation and see if it works:

training_augmentation:
  augmentations:
    PadIfNeeded:
      min_height: 512  # or whatever's the next biggest divisible-by-32 number
      min_width: 512

and then report back.

If I end up running some training I’ll do the same.

Thanks!

0reactions

nrweircommented, Oct 7, 2019

@KPGeo thanks for this, very good to know. We can make those changes unless you want to put a PR in.