question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. ItΒ collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

run_clip.py RuntimeError

See original GitHub issue

System Info

  • transformers version: 4.22.0.dev0
  • Platform: Linux-3.10.0-957.el7.x86_64-x86_64-with-glibc2.17
  • Python version: 3.9.12
  • Huggingface_hub version: 0.8.1
  • PyTorch version (GPU?): 1.12.0+cu102 (True)
  • Tensorflow version (GPU?): not installed (NA)
  • Flax version (CPU?/GPU?/TPU?): not installed (NA)
  • Jax version: not installed
  • JaxLib version: not installed
  • Using GPU in script?: <fill in>
  • Using distributed or parallel set-up in script?: <fill in>

Who can help?

Hi, @patil-suraj When I run run_clip.py following the steps in the README, I get an error like the following:

[INFO|trainer.py:2644] 2022-08-02 04:07:15,699 >> Saving model checkpoint to clip-roberta-finetuned/checkpoint-4500
[INFO|configuration_utils.py:446] 2022-08-02 04:07:15,701 >> Configuration saved in clip-roberta-finetuned/checkpoint-4500/config.json
[INFO|modeling_utils.py:1567] 2022-08-02 04:07:17,602 >> Model weights saved in clip-roberta-finetuned/checkpoint-4500/pytorch_model.bin
/root/anaconda3/envs/h-transformers/lib/python3.9/site-packages/torch/nn/parallel/_functions.py:68: UserWarning: Was asked to gather along dimension 0, but all input tensors were scalars; will instead unsqueeze and return a vector.
  warnings.warn('Was asked to gather along dimension 0, but all '
 33%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰                                                                                                                                                                | 4623/13872 [1:56:27<3:50:22,  1.49s/it]Traceback (most recent call last):
  File "/home/gsj/transformers/examples/pytorch/contrastive-image-text/run_clip.py", line 537, in <module>
    main()
  File "/home/gsj/transformers/examples/pytorch/contrastive-image-text/run_clip.py", line 508, in main
    train_result = trainer.train(resume_from_checkpoint=checkpoint)
  File "/root/anaconda3/envs/h-transformers/lib/python3.9/site-packages/transformers/trainer.py", line 1502, in train
    return inner_training_loop(
  File "/root/anaconda3/envs/h-transformers/lib/python3.9/site-packages/transformers/trainer.py", line 1744, in _inner_training_loop
    tr_loss_step = self.training_step(model, inputs)
  File "/root/anaconda3/envs/h-transformers/lib/python3.9/site-packages/transformers/trainer.py", line 2474, in training_step
    loss = self.compute_loss(model, inputs)
  File "/root/anaconda3/envs/h-transformers/lib/python3.9/site-packages/transformers/trainer.py", line 2506, in compute_loss
    outputs = model(**inputs)
  File "/root/anaconda3/envs/h-transformers/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/root/anaconda3/envs/h-transformers/lib/python3.9/site-packages/torch/nn/parallel/data_parallel.py", line 169, in forward
    return self.gather(outputs, self.output_device)
  File "/root/anaconda3/envs/h-transformers/lib/python3.9/site-packages/torch/nn/parallel/data_parallel.py", line 181, in gather
    return gather(outputs, output_device, dim=self.dim)
  File "/root/anaconda3/envs/h-transformers/lib/python3.9/site-packages/torch/nn/parallel/scatter_gather.py", line 78, in gather
    res = gather_map(outputs)
  File "/root/anaconda3/envs/h-transformers/lib/python3.9/site-packages/torch/nn/parallel/scatter_gather.py", line 69, in gather_map
    return type(out)((k, gather_map([d[k] for d in outputs]))
  File "<string>", line 10, in __init__
  File "/root/anaconda3/envs/h-transformers/lib/python3.9/site-packages/transformers/utils/generic.py", line 188, in __post_init__
    for element in iterator:
  File "/root/anaconda3/envs/h-transformers/lib/python3.9/site-packages/torch/nn/parallel/scatter_gather.py", line 69, in <genexpr>
    return type(out)((k, gather_map([d[k] for d in outputs]))
  File "/root/anaconda3/envs/h-transformers/lib/python3.9/site-packages/torch/nn/parallel/scatter_gather.py", line 63, in gather_map
    return Gather.apply(target_device, dim, *outputs)
  File "/root/anaconda3/envs/h-transformers/lib/python3.9/site-packages/torch/nn/parallel/_functions.py", line 75, in forward
    return comm.gather(inputs, ctx.dim, ctx.target_device)
  File "/root/anaconda3/envs/h-transformers/lib/python3.9/site-packages/torch/nn/parallel/comm.py", line 235, in gather
    return torch._C._gather(tensors, dim, destination)
RuntimeError: Input tensor at index 1 has invalid shape [4, 4], but expected [4, 5]

How to solve this error. Thanks!

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, …)
  • My own task or dataset (give details below)

Reproduction

python run_clip.py

Expected behavior

run run_clip.py success

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:11 (1 by maintainers)

github_iconTop GitHub Comments

1reaction
gongshaojie12commented, Aug 8, 2022

Hi, @ydshieh I got it, thanks a lot!

1reaction
gongshaojie12commented, Aug 5, 2022

@gongshaojie12 I want to double check if you use multiple GPUs ??

Hi @ydshieh ,yes, I used two GPUs for training

Read more comments on GitHub >

github_iconTop Results From Across the Web

Error when running in CPU mode Β· Issue #70 - GitHub
I get RuntimeError: "softmax_lastdim_kernel_impl" not implemented for 'Half' when running this against my CPU. To reproduce. $ python generate.
Read more >
RuntimeError: CUDA out of memory with Clip interrogator
When I try to run Clip interrogator on Automatic1111 (locally on PC with GTX 1060), ... I'm not into python, and I fear...
Read more >
python - "RuntimeError: asyncio.run() cannot be called from a ...
It's a known problem related to IPython. One way as you already found is to use nest_asyncio : import nest_asyncio nest_asyncio.apply().
Read more >
VSGAN - VapourSynth GAN Implementation ... - Doom9's Forum
Just open a console window wherever python.exe is, ... scale=2 ) clip = vsgan_device.run(clip=clip, chunk=True) clip.set_output().
Read more >
VSGAN - VapourSynth GAN Implementation ... - Doom9's Forum
https://github.com/Oriode/ESRGAN-Til...ter/upscale.py. They just need to be "massaged" into ... RuntimeError: CUDA out of memory.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found