question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

`interpolate_pos_encoding(x, pos_embed)` doesnt return correct dimension for images that is not square (w != h)

See original GitHub issue

I notice the generation of positional embedding in interpolate_pos_encoding method is slightly different than the one in the forward_selfattention method. The following simple modification bring both into the same page, to your interest.

    def interpolate_pos_encoding(self, x, pos_embed, w, h):  # passing w and h as arguments
        npatch = x.shape[1] - 1
        N = pos_embed.shape[1] - 1
        if npatch == N:
            return pos_embed
        class_emb = pos_embed[:, 0]
        pos_embed = pos_embed[:, 1:]
        dim = x.shape[-1]
        w0 = w // self.patch_embed.patch_size  # just copy paste from forward_selfattention
        h0 = h // self.patch_embed.patch_size
        pos_embed = nn.functional.interpolate(
            pos_embed.reshape(1, int(math.sqrt(N)), int(math.sqrt(N)), dim).permute(0, 3, 1, 2),
            scale_factor=(w0 / math.sqrt(N), h0 / math.sqrt(N)),  # replace math.sqrt(npatch / N) with one from forward_selfattention
            mode='bicubic',
        )
        pos_embed = pos_embed.permute(0, 2, 3, 1).view(1, -1, dim)
        return torch.cat((class_emb.unsqueeze(0), pos_embed), dim=1)

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:11 (3 by maintainers)

github_iconTop GitHub Comments

3reactions
mathildecaron31commented, May 13, 2021

Hi @KeremTurgutlu , let me open a new issue 😃

@enverfakhan I have incorporated your suggested fix for the floating point error and have also been trying to improve the forward logic in the vision_transformer.py code. Thanks a lot for your suggestion and feedback is appreciated if you do have some time 😃. https://github.com/facebookresearch/dino/blob/6687929d7cdc2e7a5150f6e24c2b6713293944ac/vision_transformer.py#L174-L233

I’m closing this issue. Feel free to reopen is there is other problem related to the interpolation of the positional encodings.

3reactions
mathildecaron31commented, May 2, 2021

I actually tried the deit_small(patch_size=8) for retrieval task on a in-house data, it seems to be working on par with a supervised vgg imagenet,

That’s slightly disappointing 😕. Have you tried the other models ? For example ViT-Base/16 should be more manageable memorywise. As a matter of fact, on copy detection datasets, I’ve found the base models to perform clearly better than the small ones: I get better performance with Base16x16 than with Small8x8 though Small8x8 is better at k-NN ImNet.

About the workaround for the floating point error, I feel like incrementing the w0 and h0 a small amount is more legit than zero padding the pos_embed but it is probably not a big deal especially if the image size is relatively big.

Yes your solution is definitely better ! I’ll update that in the code.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Understanding Digital Image Interpolation
Image interpolation occurs in all digital photos at some stage — whether this be in bayer demosaicing or in photo enlargement. It happens...
Read more >
How to get convert to not interpolate pixels?
My astronomical images are pretty low resolution (150 x 300 pixels) but convert seems to make images that are larger than 150 x...
Read more >
The Secrets of Colour Interpolation - Alan Zucconi
Leanr how to master colour interpolation with this tutorial. ... Lerping in two dimension only requires to independently lerp the X and Y ......
Read more >
Bilinear interpolation - Wikipedia
In mathematics, bilinear interpolation is a method for interpolating functions of two variables (e.g., x and y) using repeated linear interpolation.
Read more >
Interpolation with React Native Animations | by evening kid
interpolate. We'll take a regular square with a simple animation that moves it 100 pixels to the right. import React, { useEffect, ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found