run_clip.py RuntimeError
See original GitHub issueSystem Info
transformers
version: 4.22.0.dev0- Platform: Linux-3.10.0-957.el7.x86_64-x86_64-with-glibc2.17
- Python version: 3.9.12
- Huggingface_hub version: 0.8.1
- PyTorch version (GPU?): 1.12.0+cu102 (True)
- Tensorflow version (GPU?): not installed (NA)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Using GPU in script?: <fill in>
- Using distributed or parallel set-up in script?: <fill in>
Who can help?
Hi, @patil-suraj When I run run_clip.py
following the steps in the README, I get an error like the following:
[INFO|trainer.py:2644] 2022-08-02 04:07:15,699 >> Saving model checkpoint to clip-roberta-finetuned/checkpoint-4500
[INFO|configuration_utils.py:446] 2022-08-02 04:07:15,701 >> Configuration saved in clip-roberta-finetuned/checkpoint-4500/config.json
[INFO|modeling_utils.py:1567] 2022-08-02 04:07:17,602 >> Model weights saved in clip-roberta-finetuned/checkpoint-4500/pytorch_model.bin
/root/anaconda3/envs/h-transformers/lib/python3.9/site-packages/torch/nn/parallel/_functions.py:68: UserWarning: Was asked to gather along dimension 0, but all input tensors were scalars; will instead unsqueeze and return a vector.
warnings.warn('Was asked to gather along dimension 0, but all '
33%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 4623/13872 [1:56:27<3:50:22, 1.49s/it]Traceback (most recent call last):
File "/home/gsj/transformers/examples/pytorch/contrastive-image-text/run_clip.py", line 537, in <module>
main()
File "/home/gsj/transformers/examples/pytorch/contrastive-image-text/run_clip.py", line 508, in main
train_result = trainer.train(resume_from_checkpoint=checkpoint)
File "/root/anaconda3/envs/h-transformers/lib/python3.9/site-packages/transformers/trainer.py", line 1502, in train
return inner_training_loop(
File "/root/anaconda3/envs/h-transformers/lib/python3.9/site-packages/transformers/trainer.py", line 1744, in _inner_training_loop
tr_loss_step = self.training_step(model, inputs)
File "/root/anaconda3/envs/h-transformers/lib/python3.9/site-packages/transformers/trainer.py", line 2474, in training_step
loss = self.compute_loss(model, inputs)
File "/root/anaconda3/envs/h-transformers/lib/python3.9/site-packages/transformers/trainer.py", line 2506, in compute_loss
outputs = model(**inputs)
File "/root/anaconda3/envs/h-transformers/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/root/anaconda3/envs/h-transformers/lib/python3.9/site-packages/torch/nn/parallel/data_parallel.py", line 169, in forward
return self.gather(outputs, self.output_device)
File "/root/anaconda3/envs/h-transformers/lib/python3.9/site-packages/torch/nn/parallel/data_parallel.py", line 181, in gather
return gather(outputs, output_device, dim=self.dim)
File "/root/anaconda3/envs/h-transformers/lib/python3.9/site-packages/torch/nn/parallel/scatter_gather.py", line 78, in gather
res = gather_map(outputs)
File "/root/anaconda3/envs/h-transformers/lib/python3.9/site-packages/torch/nn/parallel/scatter_gather.py", line 69, in gather_map
return type(out)((k, gather_map([d[k] for d in outputs]))
File "<string>", line 10, in __init__
File "/root/anaconda3/envs/h-transformers/lib/python3.9/site-packages/transformers/utils/generic.py", line 188, in __post_init__
for element in iterator:
File "/root/anaconda3/envs/h-transformers/lib/python3.9/site-packages/torch/nn/parallel/scatter_gather.py", line 69, in <genexpr>
return type(out)((k, gather_map([d[k] for d in outputs]))
File "/root/anaconda3/envs/h-transformers/lib/python3.9/site-packages/torch/nn/parallel/scatter_gather.py", line 63, in gather_map
return Gather.apply(target_device, dim, *outputs)
File "/root/anaconda3/envs/h-transformers/lib/python3.9/site-packages/torch/nn/parallel/_functions.py", line 75, in forward
return comm.gather(inputs, ctx.dim, ctx.target_device)
File "/root/anaconda3/envs/h-transformers/lib/python3.9/site-packages/torch/nn/parallel/comm.py", line 235, in gather
return torch._C._gather(tensors, dim, destination)
RuntimeError: Input tensor at index 1 has invalid shape [4, 4], but expected [4, 5]
How to solve this error. Thanks!
Information
- The official example scripts
- My own modified scripts
Tasks
- An officially supported task in the
examples
folder (such as GLUE/SQuAD, β¦) - My own task or dataset (give details below)
Reproduction
python run_clip.py
Expected behavior
run run_clip.py success
Issue Analytics
- State:
- Created a year ago
- Comments:11 (1 by maintainers)
Top Results From Across the Web
Error when running in CPU mode Β· Issue #70 - GitHub
I get RuntimeError: "softmax_lastdim_kernel_impl" not implemented for 'Half' when running this against my CPU. To reproduce. $ python generate.
Read more >RuntimeError: CUDA out of memory with Clip interrogator
When I try to run Clip interrogator on Automatic1111 (locally on PC with GTX 1060), ... I'm not into python, and I fear...
Read more >python - "RuntimeError: asyncio.run() cannot be called from a ...
It's a known problem related to IPython. One way as you already found is to use nest_asyncio : import nest_asyncio nest_asyncio.apply().
Read more >VSGAN - VapourSynth GAN Implementation ... - Doom9's Forum
Just open a console window wherever python.exe is, ... scale=2 ) clip = vsgan_device.run(clip=clip, chunk=True) clip.set_output().
Read more >VSGAN - VapourSynth GAN Implementation ... - Doom9's Forum
https://github.com/Oriode/ESRGAN-Til...ter/upscale.py. They just need to be "massaged" into ... RuntimeError: CUDA out of memory.
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Hi, @ydshieh I got it, thanks a lot!
Hi @ydshieh ,yes, I used two GPUs for training