About unified memory in Cupy
See original GitHub issueHi CuPy team,
Is there any documentation describing which CuPy functions supports unified memory ?
So far I’ve tested two examples. The first one is a dot product between large vectors, which worked for me:
import cupy as cp
pool = cp.cuda.MemoryPool(cp.cuda.malloc_managed)
cp.cuda.set_allocator(pool.malloc)
size = 32768
a = cp.ones((size, size)) # 8GB
b = cp.ones((size, size)) # 8GB
cp.dot(a, b)
and the second, is a simple SVD test:
import os
import time
import numpy as np
import cupy as cp
from cupy.cuda.memory import malloc_managed
cp.cuda.set_allocator(malloc_managed)
tAccum = 0
x = np.random.random ((50000,10000))
print ("MB ", x.nbytes/1024)
t0 = time.time()
d_x = cp.asarray(x)
t1 = time.time()
dt = t1 - t0
print('H to D transfer ', dt, ' sec')
tAccum += dt
t0 = time.time()
d_u, d_s, d_v = cp.linalg.svd(d_x)
t1 = time.time()
dt = t1 - t0
print('SVD ', dt, ' sec')
tAccum += dt
t0 = time.time()
u = cp.asnumpy(d_u)
s = cp.asnumpy(d_s)
v = cp.asnumpy(d_v)
t1 = time.time()
dt = t1 - t0
print('D to H transfer ', dt, ' sec')
tAccum += dt
print ('Total ', tAccum, ' sec')
which fails with the following error:
Traceback (most recent call last):
File "svd.py", line 25, in <module>
d_u, d_s, d_v = cp.linalg.svd(d_x)
File "/gpfs/alpine/world-shared/stf011/nvrapids_0.11_gcc_6.4.0/lib/python3.7/site-packages/cupy-7.1.1-py3.7-linux-ppc64le.egg/cupy/linalg/decomposition.py", line 307, in svd
buffersize = gesvd_bufferSize(handle, m, n)
File "cupy/cuda/cusolver.pyx", line 1237, in cupy.cuda.cusolver.dgesvd_bufferSize
File "cupy/cuda/cusolver.pyx", line 1242, in cupy.cuda.cusolver.dgesvd_bufferSize
File "cupy/cuda/cusolver.pyx", line 440, in cupy.cuda.cusolver.check_status
cupy.cuda.cusolver.CUSOLVERError: CUSOLVER_STATUS_INVALID_VALUE
We are doing benchmarking on Power9 to know the behavior of CuPy for datasets bigger than 16 GB and knowing about what CuPy features work and what doesn’t with unified memory will allow us to progress faster.
PD, according to this technical report, section 3.6
https://developer.nvidia.com/sites/default/files/akamai/cuda/files/Misc/mygpu.pdf
unified memory can be expressed in cuSolver
System configuration
IBM Power System AC922. 2x POWER9 CPU (84 smt cores each) 512 GB RAM, 6x NVIDIA Volta GPU with 16 GB HBM2 GCC 6.4 CUDA 10.1.168 NVIDIA Driver 418.67 CuPy 7.1.1
Thanks,
Benjamin
Issue Analytics
- State:
- Created 4 years ago
- Reactions:3
- Comments:17 (11 by maintainers)
Top GitHub Comments
FYI, respose from cuSolver team.
Thank you all for your comments and feedback.
Good to know it is not a problem directly related to how CuPy’s uses unified memory.
@emcastillo @anaruse @leofang We are testing/benchmarking CuPy and NV Rapids with large memory allocations in Summit supercomputer using its production environment. Our ultimate goal is to offer scalable CPU and GPU based analytics to our users.