question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

CUDA out of memory

See original GitHub issue

Not sure how i ran out of memory given this is the only time ive tried running something like this myself rather than on a colab. Doing nvidia-smi shows processes with “N/A” GPU Memory Usage, and i don’t know how to kill any of these (they don’t go away when python quits). error is as follows:

  File "c:\python\lib\runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "c:\python\lib\runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "C:\Python\Scripts\dream.exe\__main__.py", line 7, in <module>
  File "c:\python\lib\site-packages\big_sleep\cli.py", line 65, in main
    fire.Fire(train)
  File "c:\python\lib\site-packages\fire\core.py", line 141, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "c:\python\lib\site-packages\fire\core.py", line 471, in _Fire
    target=component.__name__)
  File "c:\python\lib\site-packages\fire\core.py", line 681, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
  File "c:\python\lib\site-packages\big_sleep\cli.py", line 62, in train
    imagine()
  File "c:\python\lib\site-packages\torch\nn\modules\module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "c:\python\lib\site-packages\big_sleep\big_sleep.py", line 407, in forward
    loss = self.train_step(epoch, i, image_pbar)
  File "c:\python\lib\site-packages\big_sleep\big_sleep.py", line 357, in train_step
    losses = self.model(self.encoded_texts["max"], self.encoded_texts["min"])
  File "c:\python\lib\site-packages\torch\nn\modules\module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "c:\python\lib\site-packages\big_sleep\big_sleep.py", line 216, in forward
    image_embed = perceptor.encode_image(into)
  File "c:\python\lib\site-packages\big_sleep\clip.py", line 519, in encode_image
    return self.visual(image.type(self.dtype))
  File "c:\python\lib\site-packages\torch\nn\modules\module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "c:\python\lib\site-packages\big_sleep\clip.py", line 410, in forward
    x = self.transformer(x)
  File "c:\python\lib\site-packages\torch\nn\modules\module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "c:\python\lib\site-packages\big_sleep\clip.py", line 381, in forward
    return self.resblocks(x)
  File "c:\python\lib\site-packages\torch\nn\modules\module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "c:\python\lib\site-packages\torch\nn\modules\container.py", line 119, in forward
    input = module(input)
  File "c:\python\lib\site-packages\torch\nn\modules\module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "c:\python\lib\site-packages\big_sleep\clip.py", line 369, in forward
    x = x + self.mlp(self.ln_2(x))
  File "c:\python\lib\site-packages\torch\nn\modules\module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "c:\python\lib\site-packages\torch\nn\modules\container.py", line 119, in forward
    input = module(input)
  File "c:\python\lib\site-packages\torch\nn\modules\module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "c:\python\lib\site-packages\big_sleep\clip.py", line 346, in forward
    return x * torch.sigmoid(1.702 * x)
RuntimeError: CUDA out of memory. Tried to allocate 38.00 MiB (GPU 0; 8.00 GiB total capacity; 5.32 GiB already allocated; 28.04 MiB free; 5.53 GiB reserved in total by PyTorch)

Issue Analytics

  • State:open
  • Created 3 years ago
  • Comments:14 (10 by maintainers)

github_iconTop GitHub Comments

2reactions
scambiercommented, May 26, 2021

For those in need of a quick solution: make a python script and use the num_cutouts option.

from big_sleep import Imagine

dream = Imagine(
    text = "a pyramid made of ice",
    lr = 5e-2,
    save_every = 25,
    save_progress = True,
    num_cutouts = 64 # 64 is ok for 6GB of video memory
)

dream()
2reactions
WiseNatcommented, Mar 19, 2021

To use this, Pytorch requires a decent amount of VRAM - probably around ~8GB for preset one.

It seems like setting the image_size parameter in the Imagine constructor to either 128 or 256 helps lower the amount being allocated.

Read more comments on GitHub >

github_iconTop Results From Across the Web

"RuntimeError: CUDA error: out of memory" - Stack Overflow
The error occurs because you ran out of memory on your GPU. One way to solve it is to reduce the batch size...
Read more >
Solving "CUDA out of memory" Error
Solving "CUDA out of memory" Error · 1) Use this code to see memory usage (it requires internet to install package): · 2)...
Read more >
Solving the “RuntimeError: CUDA Out of memory” error
Solving the “RuntimeError: CUDA Out of memory” error · Reduce the `batch_size` · Lower the Precision · Do what the error says ·...
Read more >
Resolving CUDA Being Out of Memory With Gradient ...
Implementing gradient accumulation and automatic mixed precision to solve CUDA out of memory issue when training big deep learning models ...
Read more >
CUDA out of memory despite available memory · Issue #485
At the End I just want to help you and the error says that you have not enough GPU Ram. You can isntall...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found