Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

TFClipModel fails to train because of None loss

See original GitHub issue

System Info

transformers version: 4.21.1 Platform: MacOS BigSur 11.6.7 Python version: 3.8.13 Huggingface_hub version: 0.8.1 Tensorflow version (GPU?): 2.7.3 (False) Flax version (CPU?/GPU?/TPU?): not installed (NA) Jax version: not installed JaxLib version: not installed Using GPU in script?: no Using distributed or parallel set-up in script?: no

@patil-suraj

Who can help?

@patil-suraj

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, …)
My own task or dataset (give details below)

Reproduction

This is the script run to attempt to fit the model to the example data. It is verbatim from the 4.21.1 docs with the addition of model.fit. The same error arose when working with my own project. The loss is always None as is y y_pred . Somewhere in the logic of https://github.com/huggingface/transformers/blob/132402d752044301b37e54405832738b16f49df6/src/transformers/modeling_tf_utils.py#L1116.

from PIL import Image
import requests
from transformers import CLIPProcessor, TFCLIPModel
import tensorflow as tf

model = TFCLIPModel.from_pretrained("openai/clip-vit-base-patch32")
processor = CLIPProcessor.from_pretrained("openai/clip-vit-base-patch32")

url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(url, stream=True).raw)

inputs = processor(
    text=["a photo of a cat", "a photo of a dog"], images=[image, image], return_tensors="tf", padding=True
)

outputs = model(**inputs)

model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=0.00001))
model.fit(dict(inputs))

Results with an error of zero gradient because the gradients are all 0s, which I expect is caused by y and y_pred both being empty dicts.

Expected behavior

Model.fit() on inputs from preprocessor completes a training step without error.

Issue Analytics

State:
Created a year ago
Comments:14 (6 by maintainers)

Top GitHub Comments

2reactions

Rocketknight1commented, Aug 18, 2022

@taymills No problem! Fixing the tests has exposed a few other issues though, which that PR will need to fix as well. Unfortunately, you’re stuck in the PR branch for now, but I’ll ping you and close this issue when it’s merged to main!

1reaction

Rocketknight1commented, Aug 18, 2022

@taymills yes, that’s part of this PR! When using the built-in loss, we now force return_loss=True for models where it is an argument. That should avoid this for CLIP and for other similar models in future.

Top Results From Across the Web

CLIP - Hugging Face

The model transfers non-trivially to most tasks and is often competitive with a fully supervised baseline without the need for any dataset specific...

Huggingface transformers) training loss sometimes decreases ...

I think this is quite weird because it seems learned something but eval_loss doesn't change while training. Does 'transformers.Trainer' select ...

Pretrain Transformers Models in PyTorch Using Hugging Face ...

Train a transformer model to use it as a pretrained transformers model which ... are fine-tuned using a masked language modeling (MLM) loss....

transformers_example — Ray 2.2.0 - the Ray documentation

choices=list(task_to_keys.keys()), ) parser.add_argument( "--train_file", type=str, default=None, help="A csv or a json file containing the training data.

A complete Hugging Face tutorial: how to build and train a ...

Typically, the dataset will be returned as a datasets.Dataset object which is nothing more than a table with rows and columns. Querying a...