question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Add image-guided object detection support to OWL-ViT

See original GitHub issue

Hi,

The OWL-ViT model is an open-vocabulary model that can be used for both zero-shot text-guided (supported) and one-shot image-guided (not supported) object detection.

It’d be great to add support for one-shot object detection to OwlViTForObjectDetection such that users can query images with an image of the target object instead of using text queries - e.g. using an image of a butterfly to search for all butterfly instances in the target image. See an example below.

Screenshot 2022-08-24 at 17 16 28

To do this, we would just need to compute and use the OwlViTModel (alias to CLIP) embeddings of the query images instead of the text query embeddings within OwlViTForObjectDetection.forward(), which would take the target image + either text queries or image queries as input. Similarly, OwlViTProcessor would be updated to preprocess sets of (image, text) and (image, query_image).

@sgugger @NielsRogge @amyeroberts @LysandreJik what do you think about this? Would this be something we would like to support?

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:10 (10 by maintainers)

github_iconTop GitHub Comments

1reaction
unographycommented, Sep 1, 2022

sure, will do, thanks for informing!

1reaction
unographycommented, Aug 25, 2022

Hi @amyeroberts @alaradirik, I’m happy to take this up!

Read more comments on GitHub >

github_iconTop Results From Across the Web

Image-Guided OWL-ViT Demo - a Hugging Face Space by adirik
Gradio demo for image-guided / one-shot object detection with OWL-ViT - OWL-ViT, introduced in Simple Open-Vocabulary Object Detection with Vision ...
Read more >
Alara Dirik on Twitter: "Transformers now supports image ...
Transformers now supports image-guided object detection with OWL-ViT - find similar objects within an image using a query image of your ...
Read more >
OWL-ViT-inference example.ipynb - Colaboratory
Given an image and one or multiple free-text queries, it finds objects matching the queries in the image. Unlike traditional object detection models,...
Read more >
COCO Dataset | Papers With Code
object detection : bounding boxes and per-instance segmentation masks with 80 object categories,; captioning: natural language descriptions of the images (see MS ...
Read more >
Niels Rogge's Post - LinkedIn
image -guided OWL-ViT: OWL-ViT is a model by Google that can do zero-shot object detection given text queries. Today we're extending this model...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found