question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Redundant normalisation of image and text features in OWL-ViT

See original GitHub issue

Who can help?

@alaradirik

Issue description

Hi,

Thank you for the codebase! As the title suggests, I think that in modeling_owlvit.py the image and text features are normalised twice while in the original codebase from Google Research they are normalised only once. In particular, in modeling_owlvit.py image and text features are normalised both in lines 1073-174 and in lines 1145-1146. On the contrary in the original code, in https://github.com/google-research/scenic/blob/main/scenic/projects/owl_vit/layers.py, the features are normalised only in lines 86-89 whereas in line 144 the normalisation parameter is set as normalize=False and there is a comment explicitly saying Don't normalize image and text embeddings:.

I think this is sensible as there is no reason for double normalisation which normally leads to performance degredation. Please let me know what do you think, and whether I’m wrong as I might be missing something.

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:5 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
ekazakoscommented, Oct 11, 2022

Glad I could help! Could you please let me know whether this boosts validation performance at all?

0reactions
alaradirikcommented, Oct 18, 2022

Hey @ekazakos, sorry for the delay! The issue will be fixed with this PR but it doesn’t affect the performance as double normalization yields the same results.

Read more comments on GitHub >

github_iconTop Results From Across the Web

OWL-ViT - Hugging Face
OWL-ViT is a zero-shot text-conditioned object detection model. ... used to resize (or rescale) and normalize images for the model and CLIPTokenizer is...
Read more >
Unified Contrastive Learning in Image-Text-Label Space
In this paper, we extend its scope to the unified visual domain, which incorporates both image and video data for cross-modal pretraining via...
Read more >
Image recognition performance enhancements using image ...
In this paper, we propose a method to enhance the image recognition performance through feature extraction and image normalization called ...
Read more >
Zero-shot object detection with OWL-ViT - Segments.ai
You can use this tool to interactively find text queries and thresholds that work well on your images. You can also leverage zero-shot...
Read more >
Why normalize images by subtracting dataset's image mean ...
Subtracting the dataset mean serves to "center" the data. Additionally, you ideally would like to divide by the sttdev of that feature or ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found