"zero-shot-image-classification" pipeline with `VisionTextDualEncoderModel` needs manual feature_extractor and tokenizer input
See original GitHub issueSystem Info
transformers: 4.20.1
platform: windows 11, google colab
Who can help?
Information
- The official example scripts
- My own modified scripts
Tasks
- An officially supported task in the
examples
folder (such as GLUE/SQuAD, …) - My own task or dataset (give details below)
Reproduction
# works
from transformers import pipeline
pipe = pipeline("zero-shot-image-classification", model="openai/clip-vit-base-patch32")
url = "http://images.cocodataset.org/val2017/000000039769.jpg"
pipe(images=url, candidate_labels=["a photo of one cat", "a photo of two cats"], hypothesis_template="{}")
# error
from transformers import pipeline
pipe2 = pipeline("zero-shot-image-classification", model="Bingsu/vitB32_bert_ko_small_clip")
url = "http://images.cocodataset.org/val2017/000000039769.jpg"
pipe2(images=url, candidate_labels=["고양이 한 마리", "고양이 두 마리"], hypothesis_template="{}")
TypeError Traceback (most recent call last)
[<ipython-input-8-c1bcb0faaf45>](https://localhost:8080/#) in <module>()
----> 1 pipe2(images=url, candidate_labels=["고양이 한 마리", "고양이 두 마리"], hypothesis_template="{}")
3 frames
[/usr/local/lib/python3.7/dist-packages/transformers/pipelines/zero_shot_image_classification.py](https://localhost:8080/#) in preprocess(self, image, candidate_labels, hypothesis_template)
90 for i, candidate_label in enumerate(candidate_labels):
91 image = load_image(image)
---> 92 images = self.feature_extractor(images=[image], return_tensors=self.framework)
93 sequence = hypothesis_template.format(candidate_label)
94 inputs = self.tokenizer(sequence, return_tensors=self.framework)
TypeError: 'NoneType' object is not callable
Currently I’m using it like this:
from transformers import AutoModel, AutoProcessor, pipeline
model = AutoModel.from_pretrained("Bingsu/vitB32_bert_ko_small_clip")
processor = AutoProcessor.from_pretrained("Bingsu/vitB32_bert_ko_small_clip")
pipe = pipeline("zero-shot-image-classification", model=model, feature_extractor=processor.feature_extractor, tokenizer=processor.tokenizer)
Expected behavior
work with pipeline("zero-shot-image-classification", model="Bingsu/vitB32_bert_ko_small_clip")
Issue Analytics
- State:
- Created a year ago
- Comments:6 (4 by maintainers)
Top Results From Across the Web
Pipelines - Hugging Face
A tokenizer in charge of mapping raw textual input to token. ... You don't need to pass it manually if you use the...
Read more >Hugging Face Transformers Package – What Is It and How To ...
Feature extraction : return a tensor representation of the text. Pipelines encapsulate the overall process of every NLP process: Tokenization: ...
Read more >"zero-shot-image-classification" pipeline with ...
"zero-shot-image-classification" pipeline with `VisionTextDualEncoderModel` needs manual feature_extractor and tokenizer input ...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Fixed per #18392
hi sorry for long reply, didn’t see this until today:
The model
https://huggingface.co/Bingsu/vitB32_bert_ko_small_clip
is a VisionTextDualEncoder, but it’s not defined within theAutoFeatureExtractor
meta class (@NielsRogge ) so the pipeline doesn’t know about it and cannot load thefeature_extractor
that’s why passing it manually works.Basically the issue lies in transformers when we added this model, it wasn’t properly configured.
Cheers.