Cupy function doesn't utilize pinned memory inside stream
See original GitHub issue-
Conditions CuPy Version : 7.2.0 CUDA Root : /usr/common/software/cuda/10.1.243 CUDA Build Version : 10010 CUDA Driver Version : 10020 CUDA Runtime Version : 10010 cuBLAS Version : 10202 cuFFT Version : 10102 cuRAND Version : 10102 cuSOLVER Version : (10, 3, 0) cuSPARSE Version : 10301 NVRTC Version : (10, 1) cuDNN Build Version : 7605 cuDNN Version : 7605 NCCL Build Version : 2506 NCCL Runtime Version : 2506
-
Code to reproduce
import numpy as np
import cupy as cp
import cupy.linalg
import cupyx.scipy.special
import cupyx as cpx
stream_1 = cp.cuda.stream.Stream()
with stream_1:
cp.random.seed(1)
A = cp.random.rand(10000, 10000)
u, v = cp.linalg.eigh(cpx.scipy.sparse.csr_matrix(A).todense())
- Error messages, stack traces, or logs
By profiling the above code, I observe that there are many small bursts of
cudaMemcpy2DAsync
s happening ineigh
, despite never explicitly requesting cupy to transfer data back. I am putting the cupy call in a stream. How do I force cupy to use pinned memory efficiently? eigh_profile5.qdrep.zip
Issue Analytics
- State:
- Created 4 years ago
- Comments:16 (11 by maintainers)
Top Results From Across the Web
Memory Management — CuPy 11.4.0 documentation
They return NumPy arrays backed by pinned memory. If CuPy's pinned memory pool is in use, the pinned memory is allocated from the...
Read more >Mapped memory functionality (zero-copy) · Issue #3452 · cupy ...
If I understand correctly, cp.cuda.alloc_pinned_memory will use the default pinned memory pool. As long as there are enough blocks, each call ...
Read more >cuda - Does accessing mapped pinned host (or a peer device ...
The code uses three streams to check if the use of mapped pinned host memory disrupt concurrency or not. Here is the code:...
Read more >CUDA Streams: Best Practices and Common Pitfalls
Memory copies can execute concurrently if (and only if). — The memory copy is in a different non-default stream. — The copy uses...
Read more >CuPy Documentation - Read the Docs
The default current stream in CuPy is CUDA's null stream (i.e., ... For using pinned memory more conveniently, we also provide a few ......
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
FYI this was opened as a bug internally in NVIDIA.
Looks like those data transfers are made outside of CuPy (likely in cuSPARSE or cuSOLVER). IIUC almost all CuPy internal kernels are prefixed with
cupy_
(orcupyx_
), but I don’t see any in those transfers.