Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Couldn't run the run_clip.py successfully

See original GitHub issue

System Info

I couldn’t run the code successfully following the README.md (https://github.com/huggingface/transformers/tree/main/examples/pytorch/contrastive-image-text#readme)。 “”" COCO_DIR = “data” ds = datasets.load_dataset(“ydshieh/coco_dataset_script”, “2017”, data_dir=COCO_DIR) “”" “”" python examples/pytorch/contrastive-image-text/run_clip.py
–output_dir ./clip-roberta-finetuned
–model_name_or_path ./clip-roberta
–data_dir ./data
–dataset_name ydshieh/coco_dataset_script
–dataset_config_name=2017
–image_column image_path
–caption_column caption
–remove_unused_columns=False
–do_train --do_eval
–per_device_train_batch_size=“64”
–per_device_eval_batch_size=“64”
–learning_rate=“5e-5” --warmup_steps=“0” --weight_decay 0.1
–overwrite_output_dir
–push_to_hub “”"

The errors are: FileNotFoundError: Couldn’t find file at https://huggingface.co/datasets/ydshieh/coco_dataset_script/resolve/main/data/train2017.zip

Who can help?

No response

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, …)
My own task or dataset (give details below)

Reproduction

“”" COCO_DIR = “data” ds = datasets.load_dataset(“ydshieh/coco_dataset_script”, “2017”, data_dir=COCO_DIR) “”" “”" python examples/pytorch/contrastive-image-text/run_clip.py
–output_dir ./clip-roberta-finetuned
–model_name_or_path ./clip-roberta
–data_dir ./data
–dataset_name ydshieh/coco_dataset_script
–dataset_config_name=2017
–image_column image_path
–caption_column caption
–remove_unused_columns=False
–do_train --do_eval
–per_device_train_batch_size=“64”
–per_device_eval_batch_size=“64”
–learning_rate=“5e-5” --warmup_steps=“0” --weight_decay 0.1
–overwrite_output_dir
–push_to_hub “”"

The errors are: FileNotFoundError: Couldn’t find file at https://huggingface.co/datasets/ydshieh/coco_dataset_script/resolve/main/data/train2017.zip

Expected behavior

run the code successfully

Issue Analytics

State:
Created a year ago
Comments:13 (1 by maintainers)

Top GitHub Comments

1reaction

lchwhutcommented, Aug 24, 2022

The train size of “ds” i got is 80, but the real size is at least greater than 20,000. As you see, the space occupied by the train2012.zip is 19GB

0reactions

ydshiehcommented, Aug 25, 2022

@lchwhut Thank you for the detailed information. Glad it works for you now. It’s probably good for me to make a comment on my dataset page regarding this.