Support more CUDA API in CuPy JIT
See original GitHub issuePart of #4290
CuPy JIT needs more coverage of the functions and attributes supported in CUDA.
Each function can be implemented in CuPy JIT by writing a class that inherits from BuiltinFunc
here.
- Math modules: https://docs.nvidia.com/cuda/cuda-math-api/
- math module functions
- cupy ufunc
- CuPy JIT specific interface
- Atomic Functions: https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#atomic-functions
- Warp intrinsics: https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#warp-shuffle-functions
- Warp shuffle functions #5387
- Warp match functions
- Warp reduce functions
Issue Analytics
- State:
- Created 2 years ago
- Reactions:1
- Comments:6 (6 by maintainers)
Top Results From Across the Web
Performance Best Practices — CuPy 11.4.0 documentation
Performance Best Practices#. Here we gather a few tricks and advices for improving CuPy's performance. Benchmarking#. It is utterly important to first ...
Read more >Interoperability — CuPy 11.4.0 documentation
Numba is a Python JIT compiler with NumPy support. ... In addition, cupy.asarray() supports zero-copy conversion from Numba CUDA array to CuPy array....
Read more >User-Defined Kernels — CuPy 11.4.0 documentation
CuPy provides easy ways to define three types of CUDA kernels: ... memory in RawKernel is supported via CUDA Runtime's Texture (Surface) Object...
Read more >Overview — CuPy 11.4.0 documentation
CuPy is a NumPy/SciPy-compatible array library for GPU-accelerated ... Stream and Event: CUDA stream and per-thread default stream are supported by all APIs....
Read more >cupyx.jit._interface._JitRawKernel — CuPy 11.4.0 ...
Calls the CUDA kernel. The compilation will be deferred until the first function call. CuPy's JIT compiler infers the types of arguments at...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
@leofang I feel difficult to solve this problem from
ThreadGroup
class definition. I have opened a PR that combines declarations and initializations! (#6619)xref: https://numba.readthedocs.io/en/stable/cuda-reference/kernel.html