question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Cupy function doesn't utilize pinned memory inside stream

See original GitHub issue
  • Conditions CuPy Version : 7.2.0 CUDA Root : /usr/common/software/cuda/10.1.243 CUDA Build Version : 10010 CUDA Driver Version : 10020 CUDA Runtime Version : 10010 cuBLAS Version : 10202 cuFFT Version : 10102 cuRAND Version : 10102 cuSOLVER Version : (10, 3, 0) cuSPARSE Version : 10301 NVRTC Version : (10, 1) cuDNN Build Version : 7605 cuDNN Version : 7605 NCCL Build Version : 2506 NCCL Runtime Version : 2506

  • Code to reproduce

import numpy as np
import cupy as cp
import cupy.linalg
import cupyx.scipy.special
import cupyx as cpx

stream_1 = cp.cuda.stream.Stream()
with stream_1:
    cp.random.seed(1)
    A = cp.random.rand(10000, 10000)
    u, v = cp.linalg.eigh(cpx.scipy.sparse.csr_matrix(A).todense())
  • Error messages, stack traces, or logs By profiling the above code, I observe that there are many small bursts of cudaMemcpy2DAsyncs happening in eigh, despite never explicitly requesting cupy to transfer data back. I am putting the cupy call in a stream. How do I force cupy to use pinned memory efficiently? Screenshot from 2020-03-04 18-41-52 eigh_profile5.qdrep.zip

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:16 (11 by maintainers)

github_iconTop GitHub Comments

4reactions
jakirkhamcommented, Mar 31, 2020

FYI this was opened as a bug internally in NVIDIA.

4reactions
leofangcommented, Mar 6, 2020

Looks like those data transfers are made outside of CuPy (likely in cuSPARSE or cuSOLVER). IIUC almost all CuPy internal kernels are prefixed with cupy_ (or cupyx_), but I don’t see any in those transfers.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Memory Management — CuPy 11.4.0 documentation
They return NumPy arrays backed by pinned memory. If CuPy's pinned memory pool is in use, the pinned memory is allocated from the...
Read more >
Mapped memory functionality (zero-copy) · Issue #3452 · cupy ...
If I understand correctly, cp.cuda.alloc_pinned_memory will use the default pinned memory pool. As long as there are enough blocks, each call ...
Read more >
cuda - Does accessing mapped pinned host (or a peer device ...
The code uses three streams to check if the use of mapped pinned host memory disrupt concurrency or not. Here is the code:...
Read more >
CUDA Streams: Best Practices and Common Pitfalls
Memory copies can execute concurrently if (and only if). — The memory copy is in a different non-default stream. — The copy uses...
Read more >
CuPy Documentation - Read the Docs
The default current stream in CuPy is CUDA's null stream (i.e., ... For using pinned memory more conveniently, we also provide a few ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found