Support more CUDA API in CuPy JIT

Part of #4290

CuPy JIT needs more coverage of the functions and attributes supported in CUDA. Each function can be implemented in CuPy JIT by writing a class that inherits from BuiltinFunc here.

Math modules: https://docs.nvidia.com/cuda/cuda-math-api/
- math module functions
- cupy ufunc
- CuPy JIT specific interface
Atomic Functions: https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#atomic-functions
- atomicAdd #5169
- atomicSub #5387
- atomicExch #5387
- atomicMin #5387
- atomicMax #5387
- atomicInc #5387
- atomicDec #5387
- atomicCAS #5387
- atomicAnd #5387
- atomicOr #5387
- atomicXor #5387
Warp intrinsics: https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#warp-shuffle-functions
- Warp shuffle functions #5387
- Warp match functions
- Warp reduce functions

Issue Analytics

State:
Created 2 years ago
Reactions:1
Comments:6 (6 by maintainers)

Top GitHub Comments

1reaction

asi1024commented, Apr 4, 2022

@leofang I feel difficult to solve this problem from ThreadGroup class definition. I have opened a PR that combines declarations and initializations! (#6619)

1reaction

leofangcommented, Jun 17, 2021

xref: https://numba.readthedocs.io/en/stable/cuda-reference/kernel.html

Top Results From Across the Web

Performance Best Practices — CuPy 11.4.0 documentation

Performance Best Practices#. Here we gather a few tricks and advices for improving CuPy's performance. Benchmarking#. It is utterly important to first ...

Interoperability — CuPy 11.4.0 documentation

Numba is a Python JIT compiler with NumPy support. ... In addition, cupy.asarray() supports zero-copy conversion from Numba CUDA array to CuPy array....

User-Defined Kernels — CuPy 11.4.0 documentation

CuPy provides easy ways to define three types of CUDA kernels: ... memory in RawKernel is supported via CUDA Runtime's Texture (Surface) Object...

Overview — CuPy 11.4.0 documentation

CuPy is a NumPy/SciPy-compatible array library for GPU-accelerated ... Stream and Event: CUDA stream and per-thread default stream are supported by all APIs....

cupyx.jit._interface._JitRawKernel — CuPy 11.4.0 ...

Calls the CUDA kernel. The compilation will be deferred until the first function call. CuPy's JIT compiler infers the types of arguments at...