TFClipModel fails to train because of None loss
See original GitHub issueSystem Info
transformers version: 4.21.1 Platform: MacOS BigSur 11.6.7 Python version: 3.8.13 Huggingface_hub version: 0.8.1 Tensorflow version (GPU?): 2.7.3 (False) Flax version (CPU?/GPU?/TPU?): not installed (NA) Jax version: not installed JaxLib version: not installed Using GPU in script?: no Using distributed or parallel set-up in script?: no
Who can help?
Information
- The official example scripts
- My own modified scripts
Tasks
- An officially supported task in the
examples
folder (such as GLUE/SQuAD, …) - My own task or dataset (give details below)
Reproduction
This is the script run to attempt to fit the model to the example data. It is verbatim from the 4.21.1 docs with the addition of model.fit
. The same error arose when working with my own project. The loss is always None
as is y
y_pred
. Somewhere in the logic of https://github.com/huggingface/transformers/blob/132402d752044301b37e54405832738b16f49df6/src/transformers/modeling_tf_utils.py#L1116.
from PIL import Image
import requests
from transformers import CLIPProcessor, TFCLIPModel
import tensorflow as tf
model = TFCLIPModel.from_pretrained("openai/clip-vit-base-patch32")
processor = CLIPProcessor.from_pretrained("openai/clip-vit-base-patch32")
url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(url, stream=True).raw)
inputs = processor(
text=["a photo of a cat", "a photo of a dog"], images=[image, image], return_tensors="tf", padding=True
)
outputs = model(**inputs)
model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=0.00001))
model.fit(dict(inputs))
Results with an error of zero gradient
because the gradients are all 0s, which I expect is caused by y
and y_pred
both being empty dicts.
Expected behavior
Model.fit() on inputs from preprocessor completes a training step without error.
Issue Analytics
- State:
- Created a year ago
- Comments:14 (6 by maintainers)
Top GitHub Comments
@taymills No problem! Fixing the tests has exposed a few other issues though, which that PR will need to fix as well. Unfortunately, you’re stuck in the PR branch for now, but I’ll ping you and close this issue when it’s merged to main!
@taymills yes, that’s part of this PR! When using the built-in loss, we now force
return_loss=True
for models where it is an argument. That should avoid this for CLIP and for other similar models in future.