RawModule options ignored when loading PTX/CUBIN
See original GitHub issueCuPy Version : 7.6.0
CUDA Root : /home/belt/anaconda3/envs/cusignal
CUDA Build Version : 10010
CUDA Driver Version : 11000
CUDA Runtime Version : 10010
cuBLAS Version : 10201
cuFFT Version : 10101
cuRAND Version : 10101
cuSOLVER Version : (10, 2, 0)
cuSPARSE Version : 10300
NVRTC Version : (10, 1)
cuDNN Build Version : 7605
cuDNN Version : 7600
NCCL Build Version : 2406
NCCL Runtime Version : 2706
CUB Version : None
cuTENSOR Version : None
I’m noticing that options passed to RawModules are ignored when loading PTX or cubin.
I don’t believe this is the correct functionality.
As an example, -use_fast_math
can affect the codegen (PTX -> cubin).
module = cp.RawModule(
path=dir+'/spectral_analysis/_spectral.ptx',
options=("-std=c++11", "-use_fast_math")
)
_cupy_kernel_cache[(str(np_type), k_type.value)] = module.get_function(
"_cupy_lombscargle"
)
print(module.options)
output
()
Issue Analytics
- State:
- Created 3 years ago
- Comments:6 (6 by maintainers)
Top Results From Across the Web
No results found
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Okay, I have a few more breadcrumbs.
Xptxas
options only apply to Stage 2 compilation (PTX -> SASS). This is why we can’t see warnings like-warn-spills
during the NVRTC process. I’m curious if you were to change outcuModuleLoad
withcuModuleLoadDataEx()
. You should be able to pass options and parameters. Also, according to the guide, you should be able to retrieve options outputs and possible error codes from previous, async launches. Note: Only a subset of-Xptxas
options are allowed, via specific cuJITOptions enumnvcc
andnvrtc
backend.Fun fact! This morning I was able to pass a fatbin to a RawModule. In the scenario where I want a binary code for all architectures (sm_35 -> sm_75 (CUDA 10.2)) and a single PTX code for the latest architecture (sm_75). It looks like the correct binary is being retrieved. Using a fatbin over multiple cubins and ptx files saves space, less files to maintain, and less logic to choose the correct binary.
I tested the following scenarios and everything is working as expected (I believe)
@leofang Give me a day to dive a little deeper. It’s completely possible I’m misunderstanding something and I’m responsible for passing parameters to the compiler. I thought some flags might affect the outcome from PTX -> cubin. I’ve never worried about this before so forgive me if I misspoke.
I asked internally, but maybe I didn’t word my question well.
I will also look at
cuModuleLoad()
andcuModuleLoadDataEx()
.