Couldn't run the run_clip.py successfully
See original GitHub issueSystem Info
I couldn’t run the code successfully following the README.md (https://github.com/huggingface/transformers/tree/main/examples/pytorch/contrastive-image-text#readme)。
“”"
COCO_DIR = “data”
ds = datasets.load_dataset(“ydshieh/coco_dataset_script”, “2017”, data_dir=COCO_DIR)
“”"
“”"
python examples/pytorch/contrastive-image-text/run_clip.py
–output_dir ./clip-roberta-finetuned
–model_name_or_path ./clip-roberta
–data_dir ./data
–dataset_name ydshieh/coco_dataset_script
–dataset_config_name=2017
–image_column image_path
–caption_column caption
–remove_unused_columns=False
–do_train --do_eval
–per_device_train_batch_size=“64”
–per_device_eval_batch_size=“64”
–learning_rate=“5e-5” --warmup_steps=“0” --weight_decay 0.1
–overwrite_output_dir
–push_to_hub
“”"
The errors are: FileNotFoundError: Couldn’t find file at https://huggingface.co/datasets/ydshieh/coco_dataset_script/resolve/main/data/train2017.zip
Who can help?
No response
Information
- The official example scripts
- My own modified scripts
Tasks
- An officially supported task in the
examples
folder (such as GLUE/SQuAD, …) - My own task or dataset (give details below)
Reproduction
“”"
COCO_DIR = “data”
ds = datasets.load_dataset(“ydshieh/coco_dataset_script”, “2017”, data_dir=COCO_DIR)
“”"
“”"
python examples/pytorch/contrastive-image-text/run_clip.py
–output_dir ./clip-roberta-finetuned
–model_name_or_path ./clip-roberta
–data_dir ./data
–dataset_name ydshieh/coco_dataset_script
–dataset_config_name=2017
–image_column image_path
–caption_column caption
–remove_unused_columns=False
–do_train --do_eval
–per_device_train_batch_size=“64”
–per_device_eval_batch_size=“64”
–learning_rate=“5e-5” --warmup_steps=“0” --weight_decay 0.1
–overwrite_output_dir
–push_to_hub
“”"
The errors are: FileNotFoundError: Couldn’t find file at https://huggingface.co/datasets/ydshieh/coco_dataset_script/resolve/main/data/train2017.zip
Expected behavior
run the code successfully
Issue Analytics
- State:
- Created a year ago
- Comments:13 (1 by maintainers)
Top GitHub Comments
The train size of “ds” i got is 80, but the real size is at least greater than 20,000. As you see, the space occupied by the train2012.zip is 19GB
@lchwhut Thank you for the detailed information. Glad it works for you now. It’s probably good for me to make a comment on my dataset page regarding this.