Error using visualize_attention.py. The size of tensor a (3234) must match the size of tensor b (3181) at non-singleton dimension 1
See original GitHub issueHi all, I am trying to execute visualize_attention.py with default pretrained weights on my own image as below
!python visualize_attention.py --image_path 'test/finalImg_249.png'
I get size mistamatch error. Could you please let me know what changes needs to be done here?
Error stack trace:
Please use the --pretrained_weights
argument to indicate the path of the checkpoint to evaluate.
Since no pretrained weights have been provided, we load the reference pretrained DINO weights.
/usr/local/lib/python3.7/dist-packages/torch/nn/functional.py:3458: UserWarning: Default upsampling behavior when mode=bicubic is changed to align_corners=False since 0.4.0. Please specify align_corners=True if the old behavior is desired. See the documentation of nn.Upsample for details.
“See the documentation of nn.Upsample for details.”.format(mode) /usr/local/lib/python3.7/dist-packages/torch/nn/functional.py:3503: UserWarning: The default behavior for interpolate/upsample with float scale_factor changed in 1.6.0 to align with other frameworks/libraries, and now uses scale_factor directly, instead of relying on the computed output size. If you wish to restore the old behavior, please set recompute_scale_factor=True. See the documentation of nn.Upsample for details. "The default behavior for interpolate/upsample with float scale_factor changed "
Traceback (most recent call last): File “visualize_attention.py”, line 162, in <module> attentions = model.forward_selfattention(img.to(device)) File “~/dino/vision_transformer.py”, line 246, in forward_selfattention x = x + pos_embed
RuntimeError: The size of tensor a (3234) must match the size of tensor b (3181) at non-singleton dimension 1
Image details: import cv2 img = cv2.imread(‘finalImg_249.png’) print (img.shape) #output: (427, 488, 3)
Issue Analytics
- State:
- Created 2 years ago
- Comments:20 (9 by maintainers)
Top GitHub Comments
This could be the issue https://github.com/facebookresearch/dino/blob/8aa93fdc90eae4b183c4e3c005174a9f634ecfbf/vision_transformer.py#L238-L240
Here it is
pos_embed
while in the previous case https://github.com/facebookresearch/dino/blob/8aa93fdc90eae4b183c4e3c005174a9f634ecfbf/vision_transformer.py#L235-L237 it ispatch_pos_embed
Hi all, yes that’s definitely a typo… https://github.com/facebookresearch/dino/commit/91fd052deff3106feef93c4ac6791e89effc84a2