Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

RuntimeError: CUDA error: out of memory

See original GitHub issue

❓ Questions and Help

when train my own dataset using Resnet101 backbone after 27k iterations, it always encouters this problem as below:

File "maskrcnn-benchmark/maskrcnn_benchmark/engine/trainer.py", line 75, in do_train
    losses.backward()
  File "/usr/local/lib/python3.6/dist-packages/torch/tensor.py", line 102, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph)
  File "/usr/local/lib/python3.6/dist-packages/torch/autograd/__init__.py", line 90, in backward
    allow_unreachable=True)  # allow_unreachable flag
RuntimeError: CUDA error: out of memory

btw, the input size is set to be (800, 1333).

Issue Analytics

State:
Created 5 years ago
Comments:23 (19 by maintainers)

Top GitHub Comments

1reaction

zimenglan-sysu-512commented, Nov 19, 2018

hi @fmassa,

The OOM problem has been solved. beacuse i duplicated the ground-truths several times, making the number of gt bboxes to be 2k. (very sorry for that). btw, if using cpu to compute the IoUs for prediction and gt, not only need to modify these lines, but also need to pay attention to the few lines: so that it can deal with a large amout of gt bboxes at cost of slowing the training speed (maybe training time is doubled).

about the hanging, since i upgrade ubuntu 14.04 to 16.04, install cuda 9.0 (or cuda 9.2) with difference nvidia-drivers (390, 396, 410), it sometime happens. as @chengyangfu said, when use nvidia-driver 410, the frequency is much lower.

thanks!

1reaction

zimenglan-sysu-512commented, Nov 16, 2018

thanks @fmassa . after update ubuntu 14.04 to 16.04, i will try what u suggest, and then report my results here. thanks again.

Top Results From Across the Web

"RuntimeError: CUDA error: out of memory" - Stack Overflow

The error occurs because you ran out of memory on your GPU. One way to solve it is to reduce the batch size...

Solving the “RuntimeError: CUDA Out of memory” error

Solving the “RuntimeError: CUDA Out of memory” error · Reduce the `batch_size` · Lower the Precision · Do what the error says ·...

Resolving CUDA Being Out of Memory With Gradient ...

So when you try to execute the training, and you don't have enough free CUDA memory available, then the framework you're using throws...

Solving "CUDA out of memory" Error - Kaggle

If you try to train multiple models on GPU, you are most likely to encounter some error similar to this one: RuntimeError: CUDA...

Help CUDA error: out of memory - PyTorch Forums

The error is raised if you are running our of memory on your device, so you could try to reduce the memory requirement...

Troubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.

Start Free

Top Related Reddit Thread

No results found

Top Related Tweet

No results found

Top Related Dev.to Post

No results found

RuntimeError: cuda runtime error (35) : CUDA driver version is insufficient for CUDA runtime version at /home/username/github/maskrcnn-benchmark/maskrcnn_benchmark/csrc/cuda/nms.cu:103

RuntimeError: CUDA error: out of memory

❓ Questions and Help

Issue Analytics

Top GitHub Comments

Top Results From Across the Web

Top Related Medium Post

Top Related StackOverflow Question

Troubleshoot Live Code

Top Related Reddit Thread

Top Related Hackernoon Post

Top Related Tweet

Top Related Dev.to Post

Top Related Hashnode Post

RuntimeError: cuda runtime error (35) : CUDA driver version is insufficient for CUDA runtime version at /home/username/github/maskrcnn-benchmark/maskrcnn_benchmark/csrc/cuda/nms.cu:103

Add Multilabel Classification Support