CUDA out of memory when i want to train dreambooth
See original GitHub issueDescribe the bug
I’m using T4 with colab free , when I start training it tells me cuda error, it happens when I activate prior_preservation.
Run training
Launching training on one GPU.
Steps: 0%
1/450 [00:10<1:20:12, 10.72s/it, loss=0.0338]
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
<ipython-input-2-c6e3ce5f5a40> in <module>
1 #@title Run training
2 import accelerate
----> 3 accelerate.notebook_launcher(training_function, args=(text_encoder, vae, unet))
4 with torch.no_grad():
5 torch.cuda.empty_cache()
7 frames
/usr/local/lib/python3.7/dist-packages/accelerate/launchers.py in notebook_launcher(function, args, num_processes, use_fp16, mixed_precision, use_port)
81 else:
82 print("Launching training on one CPU.")
---> 83 function(*args)
84
85 else:
<ipython-input-1-d9553ec566fc> in training_function(text_encoder, vae, unet)
364 loss = F.mse_loss(noise_pred, noise, reduction="none").mean([1, 2, 3]).mean()
365
--> 366 accelerator.backward(loss)
367 accelerator.clip_grad_norm_(unet.parameters(), args.max_grad_norm)
368 optimizer.step()
/usr/local/lib/python3.7/dist-packages/accelerate/accelerator.py in backward(self, loss, **kwargs)
882 self.scaler.scale(loss).backward(**kwargs)
883 else:
--> 884 loss.backward(**kwargs)
885
886 def unscale_gradients(self, optimizer=None):
/usr/local/lib/python3.7/dist-packages/torch/_tensor.py in backward(self, gradient, retain_graph, create_graph, inputs)
394 create_graph=create_graph,
395 inputs=inputs)
--> 396 torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
397
398 def register_hook(self, hook):
/usr/local/lib/python3.7/dist-packages/torch/autograd/__init__.py in backward(tensors, grad_tensors, retain_graph, create_graph, grad_variables, inputs)
173 Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
174 tensors, grad_tensors_, retain_graph, create_graph, inputs,
--> 175 allow_unreachable=True, accumulate_grad=True) # Calls into the C++ engine to run the backward pass
176
177 def grad(
/usr/local/lib/python3.7/dist-packages/torch/autograd/function.py in apply(self, *args)
251 "of them.")
252 user_fn = vjp_fn if vjp_fn is not Function.vjp else backward_fn
--> 253 return user_fn(self, *args)
254
255 def apply_jvp(self, *args):
/usr/local/lib/python3.7/dist-packages/torch/utils/checkpoint.py in backward(ctx, *args)
144 "none of output has requires_grad=True,"
145 " this checkpoint() is not necessary")
--> 146 torch.autograd.backward(outputs_with_grad, args_with_grad)
147 grads = tuple(inp.grad if isinstance(inp, torch.Tensor) else None
148 for inp in detached_inputs)
/usr/local/lib/python3.7/dist-packages/torch/autograd/__init__.py in backward(tensors, grad_tensors, retain_graph, create_graph, grad_variables, inputs)
173 Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
174 tensors, grad_tensors_, retain_graph, create_graph, inputs,
--> 175 allow_unreachable=True, accumulate_grad=True) # Calls into the C++ engine to run the backward pass
176
177 def grad(
RuntimeError: CUDA out of memory. Tried to allocate 1024.00 MiB (GPU 0; 14.76 GiB total capacity; 12.24 GiB already allocated; 877.75 MiB free; 12.79 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
Reproduction
No response
Logs
No response
System Info
T4 with colab free
Issue Analytics
- State:
- Created a year ago
- Reactions:1
- Comments:9 (7 by maintainers)
Top Results From Across the Web
Dreambooth Training Error (RuntimeError: CUDA out ... - Reddit
I get down to where I am to train the model, and I get: RuntimeError: CUDA out of memory. Tried to allocate 50.00...
Read more >multimodalart/dreambooth-training · Memory Limits?
Here is the error. Any help? RuntimeError: CUDA out of memory. Tried to allocate 114.00 MiB (GPU 0; 14.76 GiB total capacity; 13.46...
Read more >CUDA out of memory Dreambooth - Stack Overflow
I´ve just reduced the resolution and was able to run it. Go to. Setting up all training args. And change. resolution=512. to. resolution=256....
Read more >Solving "CUDA out of memory" Error - Kaggle
If you try to train multiple models on GPU, you are most likely to encounter some error similar to this one: RuntimeError: CUDA...
Read more >cuda out of memory max_split_size_mb - You.com
I have a problem, I get the following error RuntimeError: CUDA out of memory. ... The training is ok, but cuda out of...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
I played around with some of the settings and they indeed fix the “CUDA out of memory” problem on the RTX 3090 (24 GB):
--gradient_accumulation_steps=1
--gradient_accumulation_steps=1 --gradient_checkpointing
--gradient_accumulation_steps=1 --use_8bit_adam
--gradient_accumulation_steps=1 --gradient_checkpointing --use_8bit_adam
--gradient_accumulation_steps=2
--gradient_accumulation_steps=2 --gradient_checkpointing
--gradient_accumulation_steps=2 --use_8bit_adam
--gradient_accumulation_steps=2 --gradient_checkpointing --use_8bit_adam
--mixed_precision=fp16 --gradient_accumulation_steps=1
--mixed_precision=fp16 --gradient_accumulation_steps=1 --gradient_checkpointing
--mixed_precision=fp16 --gradient_accumulation_steps=1 --use_8bit_adam
--mixed_precision=fp16 --gradient_accumulation_steps=1 --gradient_checkpointing --use_8bit_adam
--mixed_precision=fp16 --gradient_accumulation_steps=2
--mixed_precision=fp16 --gradient_accumulation_steps=2 --gradient_checkpointing
--mixed_precision=fp16 --gradient_accumulation_steps=2 --use_8bit_adam
--mixed_precision=fp16 --gradient_accumulation_steps=2 --gradient_checkpointing --use_8bit_adam
it is!