question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

how to reduce cupy memory usage?

See original GitHub issue

Description

When using cupy, cupy takes up a lot of memory by default (about 3.8G in my program), which is quite a waste of space. I would like to know how to set it to reduce this default memory usage.

To Reproduce

import cupy as cp
from memory_profiler import profile
@profile
def mem():
    a = cp.random.randint(0, 256, (3)).astype(cp.float32)
    b = cp.random.randint(0, 256, (3)).astype(cp.float32)
    c = cp.random.randint(0, 256, (3)).astype(cp.float32)
Line     Mem usage    Increment  Occurrences   Line Contents
=============================================================
    69    352.4 MiB    352.4 MiB           1   @profile
    70                                         def mem():
    71   3887.2 MiB   3534.8 MiB           1       a = cp.random.randint(0, 256, (3)).astype(cp.float32)
    72   3887.2 MiB      0.0 MiB           1       b = cp.random.randint(0, 256, (3)).astype(cp.float32)
    73   3887.2 MiB      0.0 MiB           1       c = cp.random.randint(0, 256, (3)).astype(cp.float32)

Installation

Wheel (pip install cupy-***)

Environment

OS                           : Linux-5.4.0-124-generic-x86_64-with-glibc2.17
Python Version               : 3.8.13
CuPy Version                 : 11.1.0
CuPy Platform                : NVIDIA CUDA
NumPy Version                : 1.23.1
SciPy Version                : 1.9.1
Cython Build Version         : 0.29.24
Cython Runtime Version       : 0.29.32
CUDA Root                    : /usr/local/cuda
nvcc PATH                    : /usr/local/cuda/bin/nvcc
CUDA Build Version           : 11070
CUDA Driver Version          : 11070
CUDA Runtime Version         : 11030
cuBLAS Version               : (available)
cuFFT Version                : 10402
cuRAND Version               : 10204
cuSOLVER Version             : (11, 1, 2)
cuSPARSE Version             : (available)
NVRTC Version                : (11, 3)
Thrust Version               : 101500
CUB Build Version            : 101500
Jitify Build Version         : 4a37de0
cuDNN Build Version          : 8400
cuDNN Version                : 8201
NCCL Build Version           : None
NCCL Runtime Version         : None
cuTENSOR Version             : None
cuSPARSELt Build Version     : None
Device 0 Name                : NVIDIA GeForce RTX 3060 Laptop GPU
Device 0 Compute Capability  : 86
Device 0 PCI Bus ID          : 0000:01:00.0

Additional Information

No response

Issue Analytics

  • State:open
  • Created a year ago
  • Reactions:1
  • Comments:8 (4 by maintainers)

github_iconTop GitHub Comments

2reactions
chongweiliucommented, Sep 6, 2022

Please, read the documentation https://docs.cupy.dev/en/stable/user_guide/memory.html

I just noticed it was caused by the “import torch” first.

The default is OK.

import cupy as cp
from memory_profiler import profile
@profile
def mem():
    a = cp.random.randint(0, 256, (3)).astype(cp.float32)
    b = cp.random.randint(0, 256, (3)).astype(cp.float32)
    c = cp.random.randint(0, 256, (3)).astype(cp.float32)
Line #    Mem usage    Increment  Occurrences   Line Contents
=============================================================
   115    123.1 MiB    123.1 MiB           1   @profile
   116                                         def mem():
   117    317.4 MiB    194.3 MiB           1       a = cp.random.randint(0, 256, (3)).astype(cp.float32)
   118    317.4 MiB      0.0 MiB           1       b = cp.random.randint(0, 256, (3)).astype(cp.float32)
   119    317.4 MiB      0.0 MiB           1       c = cp.random.randint(0, 256, (3)).astype(cp.float32)

While, when import torch:

import torch
import cupy as cp
from memory_profiler import profile
@profile
def mem():
    a = cp.random.randint(0, 256, (3)).astype(cp.float32)
    b = cp.random.randint(0, 256, (3)).astype(cp.float32)
    c = cp.random.randint(0, 256, (3)).astype(cp.float32)
Filename: /home/lcw/projects/pipedetection/tools/test.py

Line #    Mem usage    Increment  Occurrences   Line Contents
=============================================================
   115    352.1 MiB    352.1 MiB           1   @profile
   116                                         def mem():
   117   3888.0 MiB   3535.9 MiB           1       a = cp.random.randint(0, 256, (3)).astype(cp.float32)
   118   3888.0 MiB      0.0 MiB           1       b = cp.random.randint(0, 256, (3)).astype(cp.float32)
   119   3888.0 MiB      0.0 MiB           1       c = cp.random.randint(0, 256, (3)).astype(cp.float32)

and

import cupy as cp
from memory_profiler import profile
@profile
def mem():
    a = cp.random.randint(0, 256, (3)).astype(cp.float32)
    import torch
    b = cp.random.randint(0, 256, (3)).astype(cp.float32)
    c = cp.random.randint(0, 256, (3)).astype(cp.float32)
Line #    Mem usage    Increment  Occurrences   Line Contents
=============================================================
   115    122.4 MiB    122.4 MiB           1   @profile
   116                                         def mem():
   117    319.9 MiB    197.6 MiB           1       a = cp.random.randint(0, 256, (3)).astype(cp.float32)
   118    537.5 MiB    217.5 MiB           1       import torch
   119   3893.6 MiB   3356.1 MiB           1       b = cp.random.randint(0, 256, (3)).astype(cp.float32)
   120   3893.6 MiB      0.0 MiB           1       c = cp.random.randint(0, 256, (3)).astype(cp.float32)

So, what happened between cupy and torch on the host memory?

1reaction
chongweiliucommented, Sep 13, 2022

It’s tracked in #5649 but I guess PyTorch has the same issue. For the meantime, if you are on Linux I’d suggest trying fork mode instead of spawn. Note that you will need to fork after loading all CUDA libraires (import cupy; import torch; torch.cuda.init()) but before calling any CUDA APIs.

Thanks for the suggestion. After a few days of work, I am now walking around the problem by replacing torch with Numba. Numba has the same API (sharing arrays on device memory between multiple processes https://numba.readthedocs.io/en/stable/cuda/ipc.html) and does not cause the memory usage problem like torch does.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Memory Management — CuPy 11.4.0 documentation
Limiting GPU Memory Usage# ... You can hard-limit the amount of GPU memory that can be allocated by using CUPY_GPU_MEMORY_LIMIT environment variable (see ......
Read more >
CuPy memory leak? (Outstanding consumption.) #4821 - GitHub
CuPy consumes ~4GB over 4GB available on dedicated RAM ...then starts consuming shared RAM up to 8GB which ends up in crashing as...
Read more >
out of memory when using cupy - Stack Overflow
you can use dask for doing the same as it does the ... instead always try to get the reduced form. da.exp(x) #...
Read more >
CuPy Documentation - Read the Docs
Kernel Templates: Quickly define element-wise and reduction operation as a ... Use NVIDIA Container Toolkit to run CuPy image with GPU.
Read more >
Shared Memory and Synchronization – GPU Programming
And then use CuPy to instruct CUDA about how much shared memory, in bytes, each thread block needs. This can be done by...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found