question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Training with a Single GPU - CUDA out of memory.

See original GitHub issue

Hey,

I have been trying to run the repo over a custom dataset for a while now. I believe that I have the custom dataset prepared accordingly after some hassling.

However, now I am stuck with getting out of CUDA memory. Could you help me find out what to change in the configs that I can reduce the video memory occupation during training (e.g. mini-batching, etc.)?

I am currently using 2 samples per GPU with number of GPUs = 1 using the bash code below (just to get the repo working),

for FOLD in 1;
do
  bash tools/dist_train_partially.sh semi ${FOLD} 5 1
done

Thanks

Issue Analytics

  • State:open
  • Created 2 years ago
  • Comments:6

github_iconTop GitHub Comments

github_iconTop Results From Across the Web

Efficient Training on a Single GPU - Hugging Face
This guide focuses on training large models efficiently on a single GPU. ... 1)).to("cuda") >>> print_gpu_utilization() GPU memory occupied: 1343 MB.
Read more >
Solving "CUDA out of memory" Error - Kaggle
If you try to train multiple models on GPU, you are most likely to encounter some error similar to this one: RuntimeError: CUDA...
Read more >
Resolving CUDA Being Out of Memory With Gradient ...
Implementing gradient accumulation and automatic mixed precision to solve CUDA out of memory issue when training big deep learning models ...
Read more >
cuda out of memory during training - Stack Overflow
I am using Pytorch to do a cat-dog classification. I keep getting a Cuda out of memory problem during training and validation. If...
Read more >
GPU usage out of memory #9320 - ultralytics/yolov5 - GitHub
CUDA Out of Memory Solutions ... If you encounter a CUDA OOM error, the steps you can take to reduce your memory usage...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found