[FEA] Added functionality to ElementwiseKernel
See original GitHub issueTo begin, cuSignal has increased its use of CuPy’s Elementwisk Kernel functionality with great success!
I would like to request two additional features.
Performance
- It is known that adding the
__restrict__
flag to pointer parameters allows the compiler to perform additional optimizations. Also, addingconst
to read-only data. https://developer.nvidia.com/blog/cuda-pro-tip-optimize-pointer-aliasing/ It would be great those two options were possible for input and output (only__restrict__
) parameters.
Functionality
- Passing in
dtype
for type inference. Currently, if a CuPy Elementwise Kernel can’t infer a data type, one must be hardcoded. I have discovered it’s faster not to create an empty array for output and just passsize=
to a Elementwise Kernel. But then I have to hardcoded the data type of the output (if there’s not input array).
As an example,
_bohman_kernel = cp.ElementwiseKernel(
"",
"float64 w",
"""
double fac { abs( start + delta * ( i - 1 ) ) };
if ( i != 0 && i != ( _ind.size() - 1 ) ) {
w = ( 1 - fac ) * cos( M_PI * fac ) + 1.0 / M_PI * sin( M_PI * fac );
} else {
w = 0;
}
""",
"_bohman_kernel",
options=("-std=c++11",),
loop_prep="double delta { 2.0 / ( _ind.size() - 1 ) }; \
double start { -1.0 + delta };",
)
w = _bohman_kernel(size=M)
Therefore, if I want the option of float64
and float32
I need to create two kernel and logic to select correct kernel.
I would be great if I could pass dtype, maybe something like
_bohman_kernel = cp.ElementwiseKernel(
"",
"T w, C a",
"""
T fac { abs( start + delta * ( i - 1 ) ) };
if ( i != 0 && i != ( _ind.size() - 1 ) ) {
w = ( 1 - fac ) * cos( M_PI * fac ) + 1.0 / M_PI * sin( M_PI * fac );
a = C(0, w);
} else {
w = 0;
a = C(w, 0);
}
""",
"_bohman_kernel",
options=("-std=c++11",),
loop_prep="double delta { 2.0 / ( _ind.size() - 1 ) }; \
double start { -1.0 + delta };",
)
w = _bohman_kernel(size=M, dtype=( ("T", float64), ("C", complex128) ), )
Issue Analytics
- State:
- Created 3 years ago
- Reactions:1
- Comments:13 (13 by maintainers)
Top Results From Across the Web
Make your Python functions 10x faster | by Rushabh Vasani
Learn to use the ElementwiseKernel API to accelerate your python code on GPU with CUDA and to speed up your NumPy code!
Read more >Parallel Algorithms - pyopencl 2022.2.4 documentation
Generate a kernel that takes a number of scalar or vector arguments (at least one vector argument), performs the map_expr on each entry...
Read more >How to use the cupy.ElementwiseKernel function in cupy - Snyk
ElementwiseKernel function in cupy. To help you get started, we've selected a few cupy examples, based on popular ways it is used in...
Read more >User-Defined Kernels — CuPy 11.4.0 documentation
An elementwise kernel can be defined by the ElementwiseKernel class. ... We can tell the ElementwiseKernel class to use manual indexing by adding...
Read more >Chainer Documentation
ElementwiseKernel class, and Chainer wraps it by ... GradientMethod, which adds some features dedicated for the first order methods.
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
I think we definitely have to add the
__restrict__
to both elementwise and reductions. I will work on an implementation and do some benchmarking@leofang Not dumb at all 😄 it’s just personal preference. I like how it catches illegal narrowing at compile time.