Out of memory error with NVIDIA K80 GPU
See original GitHub issueTrying to create an image classifier with ~1000 training samples and 7 classes but it throws a runtime error. Is there a way of reducing batch size or something else that can be done to circumvent this?
Following is the error.
RuntimeError: cuda runtime error (2) : out of memory at /pytorch/aten/src/THC/generic/THCStorage.cu:58/usr/lib/python3.5/multiprocessing/semaphore_tracker.py:129: UserWarning: semaphore_tracker: There appear to be 2 leaked semaphores to clean up at shutdown len(cache))
Issue Analytics
- State:
- Created 5 years ago
- Reactions:1
- Comments:23 (7 by maintainers)
Top Results From Across the Web
K80 crashed or wrong computation results on K80
but I get correct computation results when using GTX 680 while get K80 crashed (maybe memory violation) or obtain wrong computation from K80....
Read more >Memory allocation problem with multi-gpu (Tesla k80 ...
It seems unified memory access create problem if memory allocated on multi-gpu from two diffrenet processes which use different devices.( ...
Read more >Plugging Tesla K80 results in PCI resource allocation error
Hi, I bought a Tesla K80 card and tried to integrate it into a workstation PC (of course with sufficient ventilation).
Read more >Tesla K80 size problem - CUDA Programming and Performance
My GPU has a maximum threads per block of 1024. The memory allocation on the GPU is performed to fit my kernel inputs...
Read more >K80 GPU disappears when tries to run 2 TensorFlow ...
We setup the BIOS to recognise the GPU memory: 83:00.0 3D controller: NVIDIA Corporation GK210GL [Tesla K80] (rev a1) Subsystem: NVIDIA ...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
When I first ran this with about 550 128x128 grayscale images using a Quadro P4000 with 8 GB of memory, it immediately crashed due to insufficient memory. I adjusted the constant.MAX_BATCH_SIZE parameter from the default of 128 down to 32, and then it worked for about an hour until crashing again. The error message was: RuntimeError: cuda runtime error (2) : out of memory at /pytorch/aten/src/THC/generic/THCStorage.cu:58
I was watching the GPU memory usage before it crashed, and it fluctuated in cycles as expected for a “grid search” sort of activity. Unfortunately, it looks like the peak memory usage corresponding to the more memory-intensive models progressively increase until overwhelming the GPU memory.
Maybe it would be good, upon initialization of the program, to quantify the available memory and then cap the model search to models that fit within that limit. If the program determines that it cannot identify an optimal model within that constraint, and may require more memory, it could output such a message and hints as to how to accomplish this (i.e., smaller batches, smaller images, larger GPU memory, etc…). It might also help to offer a grayscale option in the load_image_dataset method that reduces a color image from three color channels to one grayscale channel.
also, what is the LIMIT_MEMORY parameter?
This issue is fixed in the new release. Thank you all for the contribution.