Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Out of memory error with NVIDIA K80 GPU

See original GitHub issue

Trying to create an image classifier with ~1000 training samples and 7 classes but it throws a runtime error. Is there a way of reducing batch size or something else that can be done to circumvent this?

Following is the error.

RuntimeError: cuda runtime error (2) : out of memory at /pytorch/aten/src/THC/generic/THCStorage.cu:58/usr/lib/python3.5/multiprocessing/semaphore_tracker.py:129: UserWarning: semaphore_tracker: There appear to be 2 leaked semaphores to clean up at shutdown len(cache))

Issue Analytics

State:
Created 5 years ago
Reactions:1
Comments:23 (7 by maintainers)

Top GitHub Comments

3reactions

sparkdoccommented, Aug 14, 2018

When I first ran this with about 550 128x128 grayscale images using a Quadro P4000 with 8 GB of memory, it immediately crashed due to insufficient memory. I adjusted the constant.MAX_BATCH_SIZE parameter from the default of 128 down to 32, and then it worked for about an hour until crashing again. The error message was: RuntimeError: cuda runtime error (2) : out of memory at /pytorch/aten/src/THC/generic/THCStorage.cu:58

I was watching the GPU memory usage before it crashed, and it fluctuated in cycles as expected for a “grid search” sort of activity. Unfortunately, it looks like the peak memory usage corresponding to the more memory-intensive models progressively increase until overwhelming the GPU memory.

Maybe it would be good, upon initialization of the program, to quantify the available memory and then cap the model search to models that fit within that limit. If the program determines that it cannot identify an optimal model within that constraint, and may require more memory, it could output such a message and hints as to how to accomplish this (i.e., smaller batches, smaller images, larger GPU memory, etc…). It might also help to offer a grayscale option in the load_image_dataset method that reduces a color image from three color channels to one grayscale channel.

also, what is the LIMIT_MEMORY parameter?

1reaction

haifeng-jincommented, Aug 24, 2018

This issue is fixed in the new release. Thank you all for the contribution.