extern "C" Feature Request for RawKernel and RawModule
See original GitHub issueHello, I would like to request a wrapper for raw CUDA kernels that ensures that extern "C"
is applied to CUDA source code when using RawKernels and RawModules.
Due to the lack of documentation as of version 8.0, it is difficult for inexperienced users to find out why their functions will not compile.
PyCUDA has a “no_extern_c” flag in its functions that is set to False by default (See here).
I think that this feature will be helpful for those moving from PyCUDA to CuPy.
Issue Analytics
- State:
- Created 3 years ago
- Comments:6 (3 by maintainers)
Top Results From Across the Web
Allow C++ Templating Functionality #3185 - cupy/cupy - GitHub
When specializing a template function, its name gets mangled (note that the template specialization can't be done with the extern "C" linkage to...
Read more >cupy.RawModule using name_expressions and nvcc and/or ...
Before the name_expressions parameter to RawModule in CuPy 8.0.0, I had to copy the c++-mangled names into the get_function() method manually of ...
Read more >cupy.RawModule — CuPy 11.4.0 documentation
An RawKernel instance. Return type. RawKernel. Note. The following example shows how to retrieve one of the specialized C++ template kernels:.
Read more >CuPy Documentation - Read the Docs
Raw Kernel : Import existing CUDA C/C++ code ... Part of the CUDA features in CuPy will be activated only when the corresponding...
Read more >cupy/community - Gitter
... NumPy Random Generator API, improved AMD ROCm support and other features. ... out if I can somehow use cub block-wide collectives in...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Hi @veritas9872, thanks for the suggestion, and sorry to hear your frustration from your migration journey. The
extern "C"
usage is demonstrated in the Tutorial session of the doc. The documentation might not be obvious and certainly has room for improvement, as always.However, I think your suggestion would lead to backward incompatibility and so likely break existing codes. In particular, my original intention of
RawModule
(#2389, added since CuPy v7.0) was to load external source code, which should already have proper C bindings set up in order to fetch by name. In addition, unlike in PyCUDA ourRawModule
also supports C++ template kernels vianame_expressions
, which does not require an enclosingextern "C"
block at all. So all these complexities that add up together would make code preprocessing convolved, which is something I strive to avoid as recently mentioned in another (kinda related) issue (#4246). (In fact we don’t do any preprocessing and let the compiler handle it directly.)Another minor point is PyCUDA’s
no_extern_c
flag is a bit unintuitive due to double negativity.IMHO the burden is on those who are brave enough to bring custom CUDA C/C++ code to Python, and the intrusion of
RawKernel
/RawModule
are kept minimum for performance considerations, with the assumption that these users know what they’re doing. The usage and practice here is no different from using the CUDA driver API in C. Just my two cents.@leofang Thank you for the explanation! I could not make my code compile and resorted to using
backend='nvcc'
because NVRTC could not find several header files. After hearing your explanation, I think that the best solution would be to have more detailed documentation with guides on how to deal with some common pitfalls. Perhaps a FAQ page or a tutorial on Medium could be created. I understand that RawKernel and RawModule are both relatively new and not frequently used. Even after several hours of googling, I could only find a few basic examples with simple kernels, whereas I needed to plug in an entire GitHub repository with CUDA project that was originally designed for MATLAB via mex. While there are not many users who are doing this right now, despite the many advantages of CuPy over PyCUDA, I believe that creating in-depth tutorials for implementing raw kernels and CUDA projects into CuPy will persuade many to use CuPy.