Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

`interpolate_pos_encoding(x, pos_embed)` doesnt return correct dimension for images that is not square (w != h)

See original GitHub issue

I notice the generation of positional embedding in interpolate_pos_encoding method is slightly different than the one in the forward_selfattention method. The following simple modification bring both into the same page, to your interest.

    def interpolate_pos_encoding(self, x, pos_embed, w, h):  # passing w and h as arguments
        npatch = x.shape[1] - 1
        N = pos_embed.shape[1] - 1
        if npatch == N:
            return pos_embed
        class_emb = pos_embed[:, 0]
        pos_embed = pos_embed[:, 1:]
        dim = x.shape[-1]
        w0 = w // self.patch_embed.patch_size  # just copy paste from forward_selfattention
        h0 = h // self.patch_embed.patch_size
        pos_embed = nn.functional.interpolate(
            pos_embed.reshape(1, int(math.sqrt(N)), int(math.sqrt(N)), dim).permute(0, 3, 1, 2),
            scale_factor=(w0 / math.sqrt(N), h0 / math.sqrt(N)),  # replace math.sqrt(npatch / N) with one from forward_selfattention
            mode='bicubic',
        )
        pos_embed = pos_embed.permute(0, 2, 3, 1).view(1, -1, dim)
        return torch.cat((class_emb.unsqueeze(0), pos_embed), dim=1)

Issue Analytics

State:
Created 2 years ago
Comments:11 (3 by maintainers)

Top GitHub Comments

3reactions

mathildecaron31commented, May 13, 2021

Hi @KeremTurgutlu , let me open a new issue 😃

@enverfakhan I have incorporated your suggested fix for the floating point error and have also been trying to improve the forward logic in the vision_transformer.py code. Thanks a lot for your suggestion and feedback is appreciated if you do have some time 😃. https://github.com/facebookresearch/dino/blob/6687929d7cdc2e7a5150f6e24c2b6713293944ac/vision_transformer.py#L174-L233

I’m closing this issue. Feel free to reopen is there is other problem related to the interpolation of the positional encodings.

3reactions

mathildecaron31commented, May 2, 2021

I actually tried the deit_small(patch_size=8) for retrieval task on a in-house data, it seems to be working on par with a supervised vgg imagenet,

That’s slightly disappointing 😕. Have you tried the other models ? For example ViT-Base/16 should be more manageable memorywise. As a matter of fact, on copy detection datasets, I’ve found the base models to perform clearly better than the small ones: I get better performance with Base16x16 than with Small8x8 though Small8x8 is better at k-NN ImNet.

About the workaround for the floating point error, I feel like incrementing the w0 and h0 a small amount is more legit than zero padding the pos_embed but it is probably not a big deal especially if the image size is relatively big.

Yes your solution is definitely better ! I’ll update that in the code.

Top Results From Across the Web

Understanding Digital Image Interpolation

Image interpolation occurs in all digital photos at some stage — whether this be in bayer demosaicing or in photo enlargement. It happens...

How to get convert to not interpolate pixels?

My astronomical images are pretty low resolution (150 x 300 pixels) but convert seems to make images that are larger than 150 x...

The Secrets of Colour Interpolation - Alan Zucconi

Leanr how to master colour interpolation with this tutorial. ... Lerping in two dimension only requires to independently lerp the X and Y ......

Bilinear interpolation - Wikipedia

In mathematics, bilinear interpolation is a method for interpolating functions of two variables (e.g., x and y) using repeated linear interpolation.

Interpolation with React Native Animations | by evening kid

interpolate. We'll take a regular square with a simple animation that moves it 100 pixels to the right. import React, { useEffect, ...

Troubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.

Start Free

Top Related Reddit Thread

No results found

Top Related Tweet

No results found

Top Related Dev.to Post

No results found

`interpolate_pos_encoding(x, pos_embed)` doesnt return correct dimension for images that is not square (w != h)

Issue Analytics

Top GitHub Comments

Top Results From Across the Web

Top Related Medium Post

Top Related StackOverflow Question

Troubleshoot Live Code

Top Related Reddit Thread

Top Related Hackernoon Post

Top Related Tweet

Top Related Dev.to Post

Top Related Hashnode Post

Error using visualize_attention.py. The size of tensor a (3234) must match the size of tensor b (3181) at non-singleton dimension 1

Error when running multiprocess inference with nccl