Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

how to reduce cupy memory usage?

See original GitHub issue

Description

When using cupy, cupy takes up a lot of memory by default (about 3.8G in my program), which is quite a waste of space. I would like to know how to set it to reduce this default memory usage.

To Reproduce

import cupy as cp
from memory_profiler import profile
@profile
def mem():
    a = cp.random.randint(0, 256, (3)).astype(cp.float32)
    b = cp.random.randint(0, 256, (3)).astype(cp.float32)
    c = cp.random.randint(0, 256, (3)).astype(cp.float32)

Line     Mem usage    Increment  Occurrences   Line Contents
=============================================================
    69    352.4 MiB    352.4 MiB           1   @profile
    70                                         def mem():
    71   3887.2 MiB   3534.8 MiB           1       a = cp.random.randint(0, 256, (3)).astype(cp.float32)
    72   3887.2 MiB      0.0 MiB           1       b = cp.random.randint(0, 256, (3)).astype(cp.float32)
    73   3887.2 MiB      0.0 MiB           1       c = cp.random.randint(0, 256, (3)).astype(cp.float32)

Installation

Wheel (pip install cupy-***)

Environment

OS                           : Linux-5.4.0-124-generic-x86_64-with-glibc2.17
Python Version               : 3.8.13
CuPy Version                 : 11.1.0
CuPy Platform                : NVIDIA CUDA
NumPy Version                : 1.23.1
SciPy Version                : 1.9.1
Cython Build Version         : 0.29.24
Cython Runtime Version       : 0.29.32
CUDA Root                    : /usr/local/cuda
nvcc PATH                    : /usr/local/cuda/bin/nvcc
CUDA Build Version           : 11070
CUDA Driver Version          : 11070
CUDA Runtime Version         : 11030
cuBLAS Version               : (available)
cuFFT Version                : 10402
cuRAND Version               : 10204
cuSOLVER Version             : (11, 1, 2)
cuSPARSE Version             : (available)
NVRTC Version                : (11, 3)
Thrust Version               : 101500
CUB Build Version            : 101500
Jitify Build Version         : 4a37de0
cuDNN Build Version          : 8400
cuDNN Version                : 8201
NCCL Build Version           : None
NCCL Runtime Version         : None
cuTENSOR Version             : None
cuSPARSELt Build Version     : None
Device 0 Name                : NVIDIA GeForce RTX 3060 Laptop GPU
Device 0 Compute Capability  : 86
Device 0 PCI Bus ID          : 0000:01:00.0

Additional Information

No response

Issue Analytics

State:
Created a year ago
Reactions:1
Comments:8 (4 by maintainers)

Top GitHub Comments

2reactions

chongweiliucommented, Sep 6, 2022

Please, read the documentation https://docs.cupy.dev/en/stable/user_guide/memory.html

I just noticed it was caused by the “import torch” first.

The default is OK.

import cupy as cp
from memory_profiler import profile
@profile
def mem():
    a = cp.random.randint(0, 256, (3)).astype(cp.float32)
    b = cp.random.randint(0, 256, (3)).astype(cp.float32)
    c = cp.random.randint(0, 256, (3)).astype(cp.float32)

Line #    Mem usage    Increment  Occurrences   Line Contents
=============================================================
   115    123.1 MiB    123.1 MiB           1   @profile
   116                                         def mem():
   117    317.4 MiB    194.3 MiB           1       a = cp.random.randint(0, 256, (3)).astype(cp.float32)
   118    317.4 MiB      0.0 MiB           1       b = cp.random.randint(0, 256, (3)).astype(cp.float32)
   119    317.4 MiB      0.0 MiB           1       c = cp.random.randint(0, 256, (3)).astype(cp.float32)

While, when import torch:

import torch
import cupy as cp
from memory_profiler import profile
@profile
def mem():
    a = cp.random.randint(0, 256, (3)).astype(cp.float32)
    b = cp.random.randint(0, 256, (3)).astype(cp.float32)
    c = cp.random.randint(0, 256, (3)).astype(cp.float32)

Filename: /home/lcw/projects/pipedetection/tools/test.py

Line #    Mem usage    Increment  Occurrences   Line Contents
=============================================================
   115    352.1 MiB    352.1 MiB           1   @profile
   116                                         def mem():
   117   3888.0 MiB   3535.9 MiB           1       a = cp.random.randint(0, 256, (3)).astype(cp.float32)
   118   3888.0 MiB      0.0 MiB           1       b = cp.random.randint(0, 256, (3)).astype(cp.float32)
   119   3888.0 MiB      0.0 MiB           1       c = cp.random.randint(0, 256, (3)).astype(cp.float32)

and

import cupy as cp
from memory_profiler import profile
@profile
def mem():
    a = cp.random.randint(0, 256, (3)).astype(cp.float32)
    import torch
    b = cp.random.randint(0, 256, (3)).astype(cp.float32)
    c = cp.random.randint(0, 256, (3)).astype(cp.float32)

Line #    Mem usage    Increment  Occurrences   Line Contents
=============================================================
   115    122.4 MiB    122.4 MiB           1   @profile
   116                                         def mem():
   117    319.9 MiB    197.6 MiB           1       a = cp.random.randint(0, 256, (3)).astype(cp.float32)
   118    537.5 MiB    217.5 MiB           1       import torch
   119   3893.6 MiB   3356.1 MiB           1       b = cp.random.randint(0, 256, (3)).astype(cp.float32)
   120   3893.6 MiB      0.0 MiB           1       c = cp.random.randint(0, 256, (3)).astype(cp.float32)

So, what happened between cupy and torch on the host memory?

1reaction

chongweiliucommented, Sep 13, 2022

It’s tracked in #5649 but I guess PyTorch has the same issue. For the meantime, if you are on Linux I’d suggest trying fork mode instead of spawn. Note that you will need to fork after loading all CUDA libraires (import cupy; import torch; torch.cuda.init()) but before calling any CUDA APIs.

Thanks for the suggestion. After a few days of work, I am now walking around the problem by replacing torch with Numba. Numba has the same API (sharing arrays on device memory between multiple processes https://numba.readthedocs.io/en/stable/cuda/ipc.html) and does not cause the memory usage problem like torch does.

Top Results From Across the Web

Memory Management — CuPy 11.4.0 documentation

Limiting GPU Memory Usage# ... You can hard-limit the amount of GPU memory that can be allocated by using CUPY_GPU_MEMORY_LIMIT environment variable (see ......

CuPy memory leak? (Outstanding consumption.) #4821 - GitHub

CuPy consumes ~4GB over 4GB available on dedicated RAM ...then starts consuming shared RAM up to 8GB which ends up in crashing as...

out of memory when using cupy - Stack Overflow

you can use dask for doing the same as it does the ... instead always try to get the reduced form. da.exp(x) #...

CuPy Documentation - Read the Docs

Kernel Templates: Quickly define element-wise and reduction operation as a ... Use NVIDIA Container Toolkit to run CuPy image with GPU.

Shared Memory and Synchronization – GPU Programming

And then use CuPy to instruct CUDA about how much shared memory, in bytes, each thread block needs. This can be done by...