question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

CUDA out of memory when i want to train dreambooth

See original GitHub issue

Describe the bug

I’m using T4 with colab free , when I start training it tells me cuda error, it happens when I activate prior_preservation.

image

Run training

Launching training on one GPU.
Steps: 0%
1/450 [00:10<1:20:12, 10.72s/it, loss=0.0338]
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-2-c6e3ce5f5a40> in <module>
      1 #@title Run training
      2 import accelerate
----> 3 accelerate.notebook_launcher(training_function, args=(text_encoder, vae, unet))
      4 with torch.no_grad():
      5     torch.cuda.empty_cache()

7 frames
/usr/local/lib/python3.7/dist-packages/accelerate/launchers.py in notebook_launcher(function, args, num_processes, use_fp16, mixed_precision, use_port)
     81             else:
     82                 print("Launching training on one CPU.")
---> 83             function(*args)
     84 
     85     else:

<ipython-input-1-d9553ec566fc> in training_function(text_encoder, vae, unet)
    364                     loss = F.mse_loss(noise_pred, noise, reduction="none").mean([1, 2, 3]).mean()
    365 
--> 366                 accelerator.backward(loss)
    367                 accelerator.clip_grad_norm_(unet.parameters(), args.max_grad_norm)
    368                 optimizer.step()

/usr/local/lib/python3.7/dist-packages/accelerate/accelerator.py in backward(self, loss, **kwargs)
    882             self.scaler.scale(loss).backward(**kwargs)
    883         else:
--> 884             loss.backward(**kwargs)
    885 
    886     def unscale_gradients(self, optimizer=None):

/usr/local/lib/python3.7/dist-packages/torch/_tensor.py in backward(self, gradient, retain_graph, create_graph, inputs)
    394                 create_graph=create_graph,
    395                 inputs=inputs)
--> 396         torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
    397 
    398     def register_hook(self, hook):

/usr/local/lib/python3.7/dist-packages/torch/autograd/__init__.py in backward(tensors, grad_tensors, retain_graph, create_graph, grad_variables, inputs)
    173     Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
    174         tensors, grad_tensors_, retain_graph, create_graph, inputs,
--> 175         allow_unreachable=True, accumulate_grad=True)  # Calls into the C++ engine to run the backward pass
    176 
    177 def grad(

/usr/local/lib/python3.7/dist-packages/torch/autograd/function.py in apply(self, *args)
    251                                "of them.")
    252         user_fn = vjp_fn if vjp_fn is not Function.vjp else backward_fn
--> 253         return user_fn(self, *args)
    254 
    255     def apply_jvp(self, *args):

/usr/local/lib/python3.7/dist-packages/torch/utils/checkpoint.py in backward(ctx, *args)
    144                 "none of output has requires_grad=True,"
    145                 " this checkpoint() is not necessary")
--> 146         torch.autograd.backward(outputs_with_grad, args_with_grad)
    147         grads = tuple(inp.grad if isinstance(inp, torch.Tensor) else None
    148                       for inp in detached_inputs)

/usr/local/lib/python3.7/dist-packages/torch/autograd/__init__.py in backward(tensors, grad_tensors, retain_graph, create_graph, grad_variables, inputs)
    173     Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
    174         tensors, grad_tensors_, retain_graph, create_graph, inputs,
--> 175         allow_unreachable=True, accumulate_grad=True)  # Calls into the C++ engine to run the backward pass
    176 
    177 def grad(

RuntimeError: CUDA out of memory. Tried to allocate 1024.00 MiB (GPU 0; 14.76 GiB total capacity; 12.24 GiB already allocated; 877.75 MiB free; 12.79 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Reproduction

No response

Logs

No response

System Info

T4 with colab free

Issue Analytics

  • State:closed
  • Created a year ago
  • Reactions:1
  • Comments:9 (7 by maintainers)

github_iconTop GitHub Comments

13reactions
skirstencommented, Oct 7, 2022

I played around with some of the settings and they indeed fix the “CUDA out of memory” problem on the RTX 3090 (24 GB):

args performance
--gradient_accumulation_steps=1 CUDA out of memory
--gradient_accumulation_steps=1 --gradient_checkpointing 0.97 steps/s @ 21.9 GB
--gradient_accumulation_steps=1 --use_8bit_adam 1.20 steps/s @ 23.4 GB 🔝
--gradient_accumulation_steps=1 --gradient_checkpointing --use_8bit_adam 1.06 steps/s @ 15.8 GB
--gradient_accumulation_steps=2 CUDA out of memory
--gradient_accumulation_steps=2 --gradient_checkpointing 0.53 steps/s @ 21.9 GB
--gradient_accumulation_steps=2 --use_8bit_adam 0.63 steps/s @ 23.4 GB
--gradient_accumulation_steps=2 --gradient_checkpointing --use_8bit_adam 0.55 steps/s @ 15.8 GB
using fp16
--mixed_precision=fp16 --gradient_accumulation_steps=1 CUDA out of memory
--mixed_precision=fp16 --gradient_accumulation_steps=1 --gradient_checkpointing 1.16 steps/s @ 22.0 GB
--mixed_precision=fp16 --gradient_accumulation_steps=1 --use_8bit_adam CUDA out of memory
--mixed_precision=fp16 --gradient_accumulation_steps=1 --gradient_checkpointing --use_8bit_adam 1.30 steps/s @ 16.9 GB 🔝
--mixed_precision=fp16 --gradient_accumulation_steps=2 CUDA out of memory
--mixed_precision=fp16 --gradient_accumulation_steps=2 --gradient_checkpointing 0.64 steps/s @ 21.9 GB
--mixed_precision=fp16 --gradient_accumulation_steps=2 --use_8bit_adam CUDA out of memory
--mixed_precision=fp16 --gradient_accumulation_steps=2 --gradient_checkpointing --use_8bit_adam 0.68 steps/s @ 16.9 GB
1reaction
tcapellecommented, Oct 20, 2022

it is!

Read more comments on GitHub >

github_iconTop Results From Across the Web

Dreambooth Training Error (RuntimeError: CUDA out ... - Reddit
I get down to where I am to train the model, and I get: RuntimeError: CUDA out of memory. Tried to allocate 50.00...
Read more >
multimodalart/dreambooth-training · Memory Limits?
Here is the error. Any help? RuntimeError: CUDA out of memory. Tried to allocate 114.00 MiB (GPU 0; 14.76 GiB total capacity; 13.46...
Read more >
CUDA out of memory Dreambooth - Stack Overflow
I´ve just reduced the resolution and was able to run it. Go to. Setting up all training args. And change. resolution=512. to. resolution=256....
Read more >
Solving "CUDA out of memory" Error - Kaggle
If you try to train multiple models on GPU, you are most likely to encounter some error similar to this one: RuntimeError: CUDA...
Read more >
cuda out of memory max_split_size_mb - You.com
I have a problem, I get the following error RuntimeError: CUDA out of memory. ... The training is ok, but cuda out of...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found