documenting cupy.cuda.function.Module?
See original GitHub issueI was working with an Nvidia engineer @laytonjb to help a group of scientists migrate their Python codebase to GPU, and he suggested to use cupy. After some experiments I realized that cupy provides almost identical functionalities as in pycuda.driver.module_from_file
, namely to load precompiled cubin (CUDA binary) and grab the kernels therein. We wonder why this great feature is not documented at all (I hope we didn’t miss anything!). IMHO this is a huge attraction for pycuda users (as I am), and for anyone who needs the flexibility of occasionally working with low-level CUDA kernels. Several issues related to JIT compilation (such as #1258, #1398, more recently #1655, etc) could’ve been less urgent if this were documented in the first place.
For people who are looking for this feature, below is the steps I found that worked perfectly for us :
suppose we have a file named cupy_mod.cu
which is defined as follows
extern "C"{ //avoid C++ name mangling
//C=A*B, so Ay=Bx
__global__ void mat_mul(double * A, double * B, double * C, int Ax, int Bx, int By) {
/* implementation goes here */
}
/* other kernels */
}
then the steps to take is
- compile the .cu file to .cubin (CUDA binary) with
nvcc -arch=sm_XX -cubin -o cupy_mod.cubin cupy_mod.cu
- load it in python
import cupy as cp
# create a Module object in python
mod = cp.cuda.function.Module()
# load the cubin
mod.load_file("/path/to/cupy_mod.cubin")
# fetch the kernel to make it a Python function object
mat_mul_cp = mod.get_function("mat_mul")
# declare A, B, C as 2D cupy arrays of dtype cp.float64
# be sure they are C contiguous arrays!
# ...omitted...
# call the function with a tuple of grid size, a tuple of block size, and a tuple of all arguments required by the kernel
# if the kernel requires shared memory, append `shared_mem=n_bytes` to the function call
mat_mul_cp( ((A.shape[1]+128-1)//128, 0, 0), (128, 0, 0), (A, B, C, cp.int32(A.shape[0]), cp.int32(B.shape[0]), cp.int32(B.shape[1])))
Issue Analytics
- State:
- Created 5 years ago
- Reactions:1
- Comments:11 (10 by maintainers)
Top GitHub Comments
It’s in #1889 but is not merged to master yet.
@kmaehashi ok I’ll try