Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

"zero-shot-image-classification" pipeline with `VisionTextDualEncoderModel` needs manual feature_extractor and tokenizer input

See original GitHub issue

System Info

transformers: 4.20.1
platform: windows 11, google colab

Who can help?

@Narsil

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, …)
My own task or dataset (give details below)

Reproduction

# works
from transformers import pipeline

pipe = pipeline("zero-shot-image-classification", model="openai/clip-vit-base-patch32")

url = "http://images.cocodataset.org/val2017/000000039769.jpg"
pipe(images=url, candidate_labels=["a photo of one cat", "a photo of two cats"], hypothesis_template="{}")

# error
from transformers import pipeline

pipe2 = pipeline("zero-shot-image-classification", model="Bingsu/vitB32_bert_ko_small_clip")

url = "http://images.cocodataset.org/val2017/000000039769.jpg"
pipe2(images=url, candidate_labels=["고양이 한 마리", "고양이 두 마리"], hypothesis_template="{}")

TypeError                                 Traceback (most recent call last)
[<ipython-input-8-c1bcb0faaf45>](https://localhost:8080/#) in <module>()
----> 1 pipe2(images=url, candidate_labels=["고양이 한 마리", "고양이 두 마리"], hypothesis_template="{}")

3 frames
[/usr/local/lib/python3.7/dist-packages/transformers/pipelines/zero_shot_image_classification.py](https://localhost:8080/#) in preprocess(self, image, candidate_labels, hypothesis_template)
     90         for i, candidate_label in enumerate(candidate_labels):
     91             image = load_image(image)
---> 92             images = self.feature_extractor(images=[image], return_tensors=self.framework)
     93             sequence = hypothesis_template.format(candidate_label)
     94             inputs = self.tokenizer(sequence, return_tensors=self.framework)

TypeError: 'NoneType' object is not callable

Colab

Currently I’m using it like this:

from transformers import AutoModel, AutoProcessor, pipeline

model = AutoModel.from_pretrained("Bingsu/vitB32_bert_ko_small_clip")
processor = AutoProcessor.from_pretrained("Bingsu/vitB32_bert_ko_small_clip")
pipe = pipeline("zero-shot-image-classification", model=model, feature_extractor=processor.feature_extractor, tokenizer=processor.tokenizer)

Expected behavior

work with pipeline("zero-shot-image-classification", model="Bingsu/vitB32_bert_ko_small_clip")

Issue Analytics

State:
Created a year ago
Comments:6 (4 by maintainers)

Top GitHub Comments

1reaction

NielsRoggecommented, Aug 29, 2022

Fixed per #18392

1reaction

Narsilcommented, Aug 1, 2022

hi sorry for long reply, didn’t see this until today:

The model https://huggingface.co/Bingsu/vitB32_bert_ko_small_clip is a VisionTextDualEncoder, but it’s not defined within the AutoFeatureExtractor meta class (@NielsRogge ) so the pipeline doesn’t know about it and cannot load the feature_extractor that’s why passing it manually works.

Basically the issue lies in transformers when we added this model, it wasn’t properly configured.

Cheers.