how to reduce cupy memory usage?
See original GitHub issueDescription
When using cupy, cupy takes up a lot of memory by default (about 3.8G in my program), which is quite a waste of space. I would like to know how to set it to reduce this default memory usage.
To Reproduce
import cupy as cp
from memory_profiler import profile
@profile
def mem():
a = cp.random.randint(0, 256, (3)).astype(cp.float32)
b = cp.random.randint(0, 256, (3)).astype(cp.float32)
c = cp.random.randint(0, 256, (3)).astype(cp.float32)
Line Mem usage Increment Occurrences Line Contents
=============================================================
69 352.4 MiB 352.4 MiB 1 @profile
70 def mem():
71 3887.2 MiB 3534.8 MiB 1 a = cp.random.randint(0, 256, (3)).astype(cp.float32)
72 3887.2 MiB 0.0 MiB 1 b = cp.random.randint(0, 256, (3)).astype(cp.float32)
73 3887.2 MiB 0.0 MiB 1 c = cp.random.randint(0, 256, (3)).astype(cp.float32)
Installation
Wheel (pip install cupy-***
)
Environment
OS : Linux-5.4.0-124-generic-x86_64-with-glibc2.17
Python Version : 3.8.13
CuPy Version : 11.1.0
CuPy Platform : NVIDIA CUDA
NumPy Version : 1.23.1
SciPy Version : 1.9.1
Cython Build Version : 0.29.24
Cython Runtime Version : 0.29.32
CUDA Root : /usr/local/cuda
nvcc PATH : /usr/local/cuda/bin/nvcc
CUDA Build Version : 11070
CUDA Driver Version : 11070
CUDA Runtime Version : 11030
cuBLAS Version : (available)
cuFFT Version : 10402
cuRAND Version : 10204
cuSOLVER Version : (11, 1, 2)
cuSPARSE Version : (available)
NVRTC Version : (11, 3)
Thrust Version : 101500
CUB Build Version : 101500
Jitify Build Version : 4a37de0
cuDNN Build Version : 8400
cuDNN Version : 8201
NCCL Build Version : None
NCCL Runtime Version : None
cuTENSOR Version : None
cuSPARSELt Build Version : None
Device 0 Name : NVIDIA GeForce RTX 3060 Laptop GPU
Device 0 Compute Capability : 86
Device 0 PCI Bus ID : 0000:01:00.0
Additional Information
No response
Issue Analytics
- State:
- Created a year ago
- Reactions:1
- Comments:8 (4 by maintainers)
Top Results From Across the Web
Memory Management — CuPy 11.4.0 documentation
Limiting GPU Memory Usage# ... You can hard-limit the amount of GPU memory that can be allocated by using CUPY_GPU_MEMORY_LIMIT environment variable (see ......
Read more >CuPy memory leak? (Outstanding consumption.) #4821 - GitHub
CuPy consumes ~4GB over 4GB available on dedicated RAM ...then starts consuming shared RAM up to 8GB which ends up in crashing as...
Read more >out of memory when using cupy - Stack Overflow
you can use dask for doing the same as it does the ... instead always try to get the reduced form. da.exp(x) #...
Read more >CuPy Documentation - Read the Docs
Kernel Templates: Quickly define element-wise and reduction operation as a ... Use NVIDIA Container Toolkit to run CuPy image with GPU.
Read more >Shared Memory and Synchronization – GPU Programming
And then use CuPy to instruct CUDA about how much shared memory, in bytes, each thread block needs. This can be done by...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
I just noticed it was caused by the “import torch” first.
The default is OK.
While, when import torch:
and
So, what happened between cupy and torch on the host memory?
Thanks for the suggestion. After a few days of work, I am now walking around the problem by replacing torch with Numba. Numba has the same API (sharing arrays on device memory between multiple processes https://numba.readthedocs.io/en/stable/cuda/ipc.html) and does not cause the memory usage problem like torch does.