Redundant normalisation of image and text features in OWL-ViT
See original GitHub issueWho can help?
Issue description
Hi,
Thank you for the codebase! As the title suggests, I think that in modeling_owlvit.py
the image and text features are normalised twice while in the original codebase from Google Research they are normalised only once. In particular, in modeling_owlvit.py
image and text features are normalised both in lines 1073-174 and in lines 1145-1146. On the contrary in the original code, in https://github.com/google-research/scenic/blob/main/scenic/projects/owl_vit/layers.py, the features are normalised only in lines 86-89 whereas in line 144 the normalisation parameter is set as normalize=False
and there is a comment explicitly saying Don't normalize image and text embeddings:
.
I think this is sensible as there is no reason for double normalisation which normally leads to performance degredation. Please let me know what do you think, and whether I’m wrong as I might be missing something.
Issue Analytics
- State:
- Created a year ago
- Comments:5 (3 by maintainers)
Top GitHub Comments
Glad I could help! Could you please let me know whether this boosts validation performance at all?
Hey @ekazakos, sorry for the delay! The issue will be fixed with this PR but it doesn’t affect the performance as double normalization yields the same results.