question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Cuda out of memory

See original GitHub issue

I’m trying to run the example.py with colab and when I load a pre-trained model as:

model = spvnas_specialized('SemanticKITTI_val_SPVNAS@20GMACs').to(device)

I get the following out of memory which is strange as I think that the network actually does not have so many parameters:

RuntimeError: CUDA out of memory. Tried to allocate 127.56 GiB (GPU 0; 14.76 GiB total capacity; 1.25 GiB already allocated; 12.37 GiB free; 1.36 GiB reserved in total by PyTorch)

Also I want to notify you that I opened an issue in the torch sparse repository as there are some problems running test on CPU with Docker environment configured as you request.

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:7 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
luiscastro1995commented, Apr 12, 2021

I am having the exact same problem when running SPVNAS/Tutorial in COLAB, with GPU backend. When I run the commands:

  • model = spvnas_specialized(‘SemanticKITTI_val_SPVNAS@20GMACs’) or
  • model = spvnas_specialized(‘SemanticKITTI_val_SPVNAS@65GMACs’).to(device) I get the following error:
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-18-f5ea08b4d2a2> in <module>()
      1 # import SPVNAS model from model zoo
      2 from model_zoo import spvnas_specialized
----> 3 model = spvnas_specialized('SemanticKITTI_val_SPVNAS@20GMACs')
      4 # model = spvnas_specialized('SemanticKITTI_val_SPVNAS@65GMACs').to(device)
      5 

18 frames
/usr/local/lib/python3.7/dist-packages/torchsparse/nn/functional/conv.py in forward(ctx, features, kernel, neighbor_map, neighbor_offset, sizes, transpose)
     38             torchsparse_backend.sparseconv_forward(features, out, kernel,
     39                                                    neighbor_map,
---> 40                                                    neighbor_offset, transpose)
     41         else:
     42             # use the native pytorch XLA APIs for the TPU.

RuntimeError: CUDA out of memory. Tried to allocate 115.36 GiB (GPU 0; 14.76 GiB total capacity; 754.88 MiB already allocated; 12.97 GiB free; 770.00 MiB reserved in total by PyTorch)

Could this error be related to the version of torchsparse library installed?

Thank you!

0reactions
luiscastro1995commented, Apr 15, 2021

Thank you @zhijian-liu. With your changes I can now run successfully your tutorial both in Colab and in my local machine!

Read more comments on GitHub >

github_iconTop Results From Across the Web

"RuntimeError: CUDA error: out of memory" - Stack Overflow
The error occurs because you ran out of memory on your GPU. One way to solve it is to reduce the batch size...
Read more >
Solving "CUDA out of memory" Error - Kaggle
Solving "CUDA out of memory" Error · 1) Use this code to see memory usage (it requires internet to install package): · 2)...
Read more >
Resolving CUDA Being Out of Memory With Gradient ...
Implementing gradient accumulation and automatic mixed precision to solve CUDA out of memory issue when training big deep learning models which requires ...
Read more >
Solving the “RuntimeError: CUDA Out of memory” error
Solving the “RuntimeError: CUDA Out of memory” error · Reduce the `batch_size` · Lower the Precision · Do what the error says ·...
Read more >
Stable Diffusion Runtime Error: How To Fix CUDA Out Of ...
How To Fix Runtime Error: CUDA Out Of Memory In Stable Diffusion · Restarting the PC worked for some people. · Reduce the...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found