Stream in the context-manager form is not used in `ElementwiseKernel` or `ReductionKernel`
See original GitHub issueThis is actually a bug reported back in #1695 that unfortunately went unnoticed.
In examples/stream/map_reduce.py
, a list of streams was created for executing cupy.matmul()
in parallel, which is backed by a ReductionKernel
in this case: https://github.com/cupy/cupy/blob/1af22f57fda92ae35bde806d0c4d110faf4fed52/cupy/core/core.pyx#L2513-L2516
However, inspecting the implementation I found that ReductionKernel
only accepts an explicit stream
argument; it does not pick up any current stream: https://github.com/cupy/cupy/blob/32718607a7808ec6bc3a24cf9231a9351f8fc95e/cupy/core/reduction.pxi#L396
In other words, that example was misleading because those streams were not used at all and so all executions were serialized, as can be checked from nvprof + nvvp (see the circle in red):
The same bug also appears in ElementwiseKernel
:
https://github.com/cupy/cupy/blob/1af22f57fda92ae35bde806d0c4d110faf4fed52/cupy/core/_kernel.pyx#L537
In my opinion, unlike RawKernel
which is not used by any CuPy core functionalities, ElementwiseKernel
and ReductionKernel
should honor the current stream by checking the current stream pointer if no stream argument is explicitly given, since many CuPy functions like cupy.matmul()
do not support passing in a stream. A similar approach is already adopted in the FFT module, see #2362.
Issue Analytics
- State:
- Created 4 years ago
- Comments:6 (6 by maintainers)
Top GitHub Comments
Actually, the previous code was fine, just increasing the matrix sizes shows real overlap
Apparently, if the
stream
is set to None in those two functions, when the kernel is launched the current stream is retrieved: https://github.com/cupy/cupy/blob/32718607a7808ec6bc3a24cf9231a9351f8fc95e/cupy/cuda/function.pyx#L126This is done in the
linear_launch
function https://github.com/cupy/cupy/blob/32718607a7808ec6bc3a24cf9231a9351f8fc95e/cupy/cuda/function.pyx#L174