question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

documenting cupy.cuda.function.Module?

See original GitHub issue

I was working with an Nvidia engineer @laytonjb to help a group of scientists migrate their Python codebase to GPU, and he suggested to use cupy. After some experiments I realized that cupy provides almost identical functionalities as in pycuda.driver.module_from_file, namely to load precompiled cubin (CUDA binary) and grab the kernels therein. We wonder why this great feature is not documented at all (I hope we didn’t miss anything!). IMHO this is a huge attraction for pycuda users (as I am), and for anyone who needs the flexibility of occasionally working with low-level CUDA kernels. Several issues related to JIT compilation (such as #1258, #1398, more recently #1655, etc) could’ve been less urgent if this were documented in the first place.

For people who are looking for this feature, below is the steps I found that worked perfectly for us : suppose we have a file named cupy_mod.cu which is defined as follows

extern "C"{ //avoid C++ name mangling 
//C=A*B, so Ay=Bx
__global__ void mat_mul(double * A, double * B, double * C, int Ax, int Bx, int By) {
   /* implementation goes here */
   }

/* other kernels */
}

then the steps to take is

  1. compile the .cu file to .cubin (CUDA binary) with nvcc -arch=sm_XX -cubin -o cupy_mod.cubin cupy_mod.cu
  2. load it in python
import cupy as cp

# create a Module object in python
mod = cp.cuda.function.Module()

# load the cubin
mod.load_file("/path/to/cupy_mod.cubin")

# fetch the kernel to make it a Python function object
mat_mul_cp = mod.get_function("mat_mul")

# declare A, B, C as 2D cupy arrays of dtype cp.float64
# be sure they are C contiguous arrays!
# ...omitted...

# call the function with a tuple of grid size, a tuple of block size, and a tuple of all arguments required by the kernel
# if the kernel requires shared memory, append `shared_mem=n_bytes` to the function call
mat_mul_cp( ((A.shape[1]+128-1)//128, 0, 0), (128, 0, 0), (A, B, C, cp.int32(A.shape[0]), cp.int32(B.shape[0]), cp.int32(B.shape[1])))

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Reactions:1
  • Comments:11 (10 by maintainers)

github_iconTop GitHub Comments

1reaction
leofangcommented, May 28, 2019

It’s in #1889 but is not merged to master yet.

1reaction
leofangcommented, Dec 4, 2018

@kmaehashi ok I’ll try

Read more comments on GitHub >

github_iconTop Results From Across the Web

User-Defined Kernels — CuPy 11.4.0 documentation
CuPy provides easy ways to define three types of CUDA kernels: elementwise kernels, reduction kernels and raw kernels. In this documentation, we describe...
Read more >
cupy.RawModule — CuPy 11.4.0 documentation
This class can be used to either compile raw CUDA sources or load CUDA modules (*.cubin, *.ptx). This class is useful when a...
Read more >
latest PDF
CuPy is a NumPy/SciPy-compatible array library for GPU-accelerated computing with Python. CuPy acts as a drop-in replacement to run existing NumPy/SciPy ...
Read more >
Basics of CuPy — CuPy 11.4.0 documentation
CuPy is a GPU array backend that implements a subset of NumPy interface. ... NumPy has numpy.linalg.norm() function that calculates it on CPU....
Read more >
Installation — CuPy 11.4.0 documentation
If you have multiple versions of CUDA Toolkit installed, CuPy will automatically choose one of the CUDA installations. See Working with Custom CUDA...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found