Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[FEA] Added functionality to ElementwiseKernel

See original GitHub issue

To begin, cuSignal has increased its use of CuPy’s Elementwisk Kernel functionality with great success!

I would like to request two additional features.

Performance

It is known that adding the __restrict__ flag to pointer parameters allows the compiler to perform additional optimizations. Also, adding const to read-only data. https://developer.nvidia.com/blog/cuda-pro-tip-optimize-pointer-aliasing/ It would be great those two options were possible for input and output (only __restrict__) parameters.

Functionality

Passing in dtype for type inference. Currently, if a CuPy Elementwise Kernel can’t infer a data type, one must be hardcoded. I have discovered it’s faster not to create an empty array for output and just pass size= to a Elementwise Kernel. But then I have to hardcoded the data type of the output (if there’s not input array).

As an example,

_bohman_kernel = cp.ElementwiseKernel(
    "",
    "float64 w",
    """
    double fac { abs( start + delta * ( i - 1 ) ) };
    if ( i != 0 && i != ( _ind.size() - 1 ) ) {
        w = ( 1 - fac ) * cos( M_PI * fac ) + 1.0 / M_PI * sin( M_PI * fac );
    } else {
        w = 0;
    }
    """,
    "_bohman_kernel",
    options=("-std=c++11",),
    loop_prep="double delta { 2.0 / ( _ind.size() - 1 ) }; \
               double start { -1.0 + delta };",
)

w = _bohman_kernel(size=M)

Therefore, if I want the option of float64 and float32 I need to create two kernel and logic to select correct kernel.

I would be great if I could pass dtype, maybe something like

_bohman_kernel = cp.ElementwiseKernel(
    "",
    "T w, C a",
    """
    T fac { abs( start + delta * ( i - 1 ) ) };
    if ( i != 0 && i != ( _ind.size() - 1 ) ) {
        w = ( 1 - fac ) * cos( M_PI * fac ) + 1.0 / M_PI * sin( M_PI * fac );
        a = C(0, w);
    } else {
        w = 0;
        a = C(w, 0);
    }
    """,
    "_bohman_kernel",
    options=("-std=c++11",),
    loop_prep="double delta { 2.0 / ( _ind.size() - 1 ) }; \
               double start { -1.0 + delta };",
    
)

w = _bohman_kernel(size=M, dtype=( ("T", float64), ("C", complex128) ), )

@z-ryan1 @awthomp @leofang