Discussion for possible enhancements of the new CUB support
See original GitHub issueWith the great effort by @anaruse in #2090, I’ve seen encouraging performance boosts. Below is a list of possible improvements I can think of, either for offering extensive support or for enabling even more boost. I am interested in knowing what I’ve missed or misunderstood.
- Support complex numbers (#2538)
- Allow using CUDA streams: (#2555)
if users set up a context manager like this
with stream:
arr.sum()
# do other stuff
The non-default stream
should be honored. All of the CUB functions introduced in #2090 support an optional stream argument. We just need to pick up the current stream pointer during setup and modify the wrappers.
-
Change the CUB wrappers from(UPDATE: see https://github.com/cupy/cupy/issues/2519#issuecomment-538205781)def
tocdef
orcpdef
:
currently they are all Python def
functions. Could be be beneficial for performance. In particular, if we don’t want to expose those wrappers to end users, cdef
would be a nice choice.
- Support batch reduction for contiguous arrays (#2562):
currently only a full reduction is supported, but if a reduction over the last axes of a contiguous array of shape, say, (X, Y, Z)
, is needed, this seems possible with a naive loop over the remaining axes. In other words, in this case we can use CUB to do arr.sum(axis=2)
or arr.sum(axis=(1,2))
, assuming arr
is C contiguous. This resembles the current treatment of PlanNd
in the FFT module.
- Document how to enable CUB support if built from source: Need to set
CUB_PATH
andCUB_DISABLED
. -> could be avoided if the CUB source code is bundled (#2584) - Support
argmin
andargmax
(#2596 enables a global (noaxis
) search) - Support half-precision floating points (#2600)
- Support F-contiguous arrays (#2682)
- Support sparse matrix operation (#2698)
- Honor the
keepdims
argument (#2725)
Question: (from https://github.com/cupy/cupy/pull/2508#issuecomment-536368493): is Jenkins configured to test CUB functionalities? UPDATE: No, see https://github.com/cupy/cupy/pull/2538#issuecomment-543507886.
Issue Analytics
- State:
- Created 4 years ago
- Comments:26 (26 by maintainers)
Top GitHub Comments
I am closing this meta-issue, as most of the listed tasks are completed except for the source-tree bundling (#2584) and the tests (#2598). Thanks to everyone for making all these improvements! It’s been a not-so-short, perhaps a bit bumpy journey since @anaruse added the initial support. 🙂
As documented in https://github.com/cupy/cupy/pull/2725#issuecomment-559879307, we could also use CUB to accelerate
mean
(which is also used internally bystd
andvar
)