question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

extern "C" Feature Request for RawKernel and RawModule

See original GitHub issue

Hello, I would like to request a wrapper for raw CUDA kernels that ensures that extern "C" is applied to CUDA source code when using RawKernels and RawModules.

Due to the lack of documentation as of version 8.0, it is difficult for inexperienced users to find out why their functions will not compile.

PyCUDA has a “no_extern_c” flag in its functions that is set to False by default (See here).

I think that this feature will be helpful for those moving from PyCUDA to CuPy.

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:6 (3 by maintainers)

github_iconTop GitHub Comments

3reactions
leofangcommented, Nov 18, 2020

Hi @veritas9872, thanks for the suggestion, and sorry to hear your frustration from your migration journey. The extern "C" usage is demonstrated in the Tutorial session of the doc. The documentation might not be obvious and certainly has room for improvement, as always.

However, I think your suggestion would lead to backward incompatibility and so likely break existing codes. In particular, my original intention of RawModule (#2389, added since CuPy v7.0) was to load external source code, which should already have proper C bindings set up in order to fetch by name. In addition, unlike in PyCUDA our RawModule also supports C++ template kernels via name_expressions, which does not require an enclosing extern "C" block at all. So all these complexities that add up together would make code preprocessing convolved, which is something I strive to avoid as recently mentioned in another (kinda related) issue (#4246). (In fact we don’t do any preprocessing and let the compiler handle it directly.)

Another minor point is PyCUDA’s no_extern_c flag is a bit unintuitive due to double negativity.

IMHO the burden is on those who are brave enough to bring custom CUDA C/C++ code to Python, and the intrusion of RawKernel/RawModule are kept minimum for performance considerations, with the assumption that these users know what they’re doing. The usage and practice here is no different from using the CUDA driver API in C. Just my two cents.

1reaction
veritas9872commented, Nov 18, 2020

@leofang Thank you for the explanation! I could not make my code compile and resorted to using backend='nvcc' because NVRTC could not find several header files. After hearing your explanation, I think that the best solution would be to have more detailed documentation with guides on how to deal with some common pitfalls. Perhaps a FAQ page or a tutorial on Medium could be created. I understand that RawKernel and RawModule are both relatively new and not frequently used. Even after several hours of googling, I could only find a few basic examples with simple kernels, whereas I needed to plug in an entire GitHub repository with CUDA project that was originally designed for MATLAB via mex. While there are not many users who are doing this right now, despite the many advantages of CuPy over PyCUDA, I believe that creating in-depth tutorials for implementing raw kernels and CUDA projects into CuPy will persuade many to use CuPy.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Allow C++ Templating Functionality #3185 - cupy/cupy - GitHub
When specializing a template function, its name gets mangled (note that the template specialization can't be done with the extern "C" linkage to...
Read more >
cupy.RawModule using name_expressions and nvcc and/or ...
Before the name_expressions parameter to RawModule in CuPy 8.0.0, I had to copy the c++-mangled names into the get_function() method manually of ...
Read more >
cupy.RawModule — CuPy 11.4.0 documentation
An RawKernel instance. Return type. RawKernel. Note. The following example shows how to retrieve one of the specialized C++ template kernels:.
Read more >
CuPy Documentation - Read the Docs
Raw Kernel : Import existing CUDA C/C++ code ... Part of the CUDA features in CuPy will be activated only when the corresponding...
Read more >
cupy/community - Gitter
... NumPy Random Generator API, improved AMD ROCm support and other features. ... out if I can somehow use cub block-wide collectives in...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found