Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

RuntimeError: CUDA out of memory

See original GitHub issue

Issue Description

I’m training Document Information Extraction for custom Dataset of 100 train, 20 validation images. This is the config that I gave:

resume_from_checkpoint_path: null 
result_path: "./result"
pretrained_model_name_or_path: "naver-clova-ix/donut-base"
dataset_name_or_paths: ["/content/drive/MyDrive/donut_1.1"] # should be prepared from
sort_json_key: True
train_batch_sizes: [1]
val_batch_sizes: [1]
input_size: [2560, 1920]
max_length: 128
align_long_axis: False
# num_nodes: 8 
num_nodes: 1
seed: 2022
lr: 3e-5
warmup_steps: 10000
num_training_samples_per_epoch: 39463
max_epochs: 300
max_steps: -1
num_workers: 8
val_check_interval: 1.0
check_val_every_n_epoch: 10
gradient_clip_val: 0.25
verbose: True

I’m getting this error with message:

RuntimeError: CUDA out of memory. Tried to allocate 76.00 MiB (GPU 0; 14.76 GiB total capacity; 13.48 GiB already allocated; 6.75 MiB free; 13.58 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

I tried clearing torch cache using torch.cuda.empty_cache() Reducing the batch size didn’t help. I tried taking a smaller dataset, (50 train, 10 validation images), which is half of the earlier dataset, the memory allocation is same “76.00 MiB”

Is there any way that I can solve this issue? Please help!

Issue Analytics

  • State:closed
  • Created 4 months ago
  • Comments:5

github_iconTop GitHub Comments

chai21bcommented, Oct 16, 2022

Thanks! Reducing the input_size from [2560, 1920] —> [1920, 1280] helped.

inesriahicommented, Dec 4, 2022

Is there a way to decrease the gpu memory consumption more? I want to finetune it on 8GB GPU

Read more comments on GitHub >

github_iconTop Results From Across the Web

"RuntimeError: CUDA error: out of memory" - Stack Overflow
The error occurs because you ran out of memory on your GPU. One way to solve it is to reduce the batch size...
Read more >
Solving "CUDA out of memory" Error - Kaggle
RuntimeError : CUDA out of memory. Tried to allocate 978.00 MiB (GPU ... 4) Here is the full code for releasing CUDA memory:...
Read more >
Solving the “RuntimeError: CUDA Out of memory” error
Solving the “RuntimeError: CUDA Out of memory” error · Reduce the `batch_size` · Lower the Precision · Do what the error says ·...
Read more >
Stable Diffusion Runtime Error: How To Fix CUDA Out Of ...
How To Fix Runtime Error: CUDA Out Of Memory In Stable Diffusion · Restarting the PC worked for some people. · Reduce the...
Read more >
CUDA out of memory despite available memory #485 - GitHub
RuntimeError : CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 7.80 GiB total capacity; 6.70 GiB already allocated; 12.44 MiB...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Post

No results found

github_iconTop Related Hashnode Post

No results found