Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

RawModule options ignored when loading PTX/CUBIN

See original GitHub issue

CuPy Version          : 7.6.0
CUDA Root             : /home/belt/anaconda3/envs/cusignal
CUDA Build Version    : 10010
CUDA Driver Version   : 11000
CUDA Runtime Version  : 10010
cuBLAS Version        : 10201
cuFFT Version         : 10101
cuRAND Version        : 10101
cuSOLVER Version      : (10, 2, 0)
cuSPARSE Version      : 10300
NVRTC Version         : (10, 1)
cuDNN Build Version   : 7605
cuDNN Version         : 7600
NCCL Build Version    : 2406
NCCL Runtime Version  : 2706
CUB Version           : None
cuTENSOR Version      : None

I’m noticing that options passed to RawModules are ignored when loading PTX or cubin. I don’t believe this is the correct functionality. As an example, -use_fast_math can affect the codegen (PTX -> cubin).

module = cp.RawModule(
    path=dir+'/spectral_analysis/_spectral.ptx',
    options=("-std=c++11", "-use_fast_math")
 )
 _cupy_kernel_cache[(str(np_type), k_type.value)] = module.get_function(
     "_cupy_lombscargle"
 )
 print(module.options)

output

()

Issue Analytics

State:
Created 3 years ago
Comments:6 (6 by maintainers)

Top GitHub Comments

2reactions

mnicelycommented, Jul 7, 2020

Okay, I have a few more breadcrumbs.

Some Xptxas options only apply to Stage 2 compilation (PTX -> SASS). This is why we can’t see warnings like -warn-spills during the NVRTC process. I’m curious if you were to change out cuModuleLoad with cuModuleLoadDataEx(). You should be able to pass options and parameters. Also, according to the guide, you should be able to retrieve options outputs and possible error codes from previous, async launches. Note: Only a subset of -Xptxas options are allowed, via specific cuJITOptions enum
The process described in (1) should also allows us to retrieve output when files are loaded as PTX and JIT’d. For both nvcc and nvrtc backend.

Fun fact! This morning I was able to pass a fatbin to a RawModule. In the scenario where I want a binary code for all architectures (sm_35 -> sm_75 (CUDA 10.2)) and a single PTX code for the latest architecture (sm_75). It looks like the correct binary is being retrieved. Using a fatbin over multiple cubins and ptx files saves space, less files to maintain, and less logic to choose the correct binary.

I tested the following scenarios and everything is working as expected (I believe)

Binaries (35, 50, 52, 53, 60, 61, 62, 70, 72, 75) + PTX (75) — Works on RTX Titan (CC 7.5)
Binaries (35, 50, 52, 53, 60, 61, 62, 70, 72, 75) — Works on RTX Titan (CC 7.5)
Binaries (35, 50, 52, 53, 60, 61, 62, 70, 72) — Works on RTX Titan (CC 7.5)
Binaries (35, 50, 52, 53, 60, 61, 62, 70) — Works on RTX Titan (CC 7.5)
Binaries (35, 50, 52, 53, 60, 61, 62) — Doesn’t work on RTX Titan (CC 7.5)
Binaries (35, 50, 52, 53, 60, 61, 62) + PTX (62) — Works on RTX Titan (CC 7.5)
Binaries (35) — Doesn’t work on RTX Titan (CC 7.5)
Binaries (35) + PTX (35) — Works on RTX Titan (CC 7.5)

1reaction

mnicelycommented, Jul 6, 2020

@leofang Give me a day to dive a little deeper. It’s completely possible I’m misunderstanding something and I’m responsible for passing parameters to the compiler. I thought some flags might affect the outcome from PTX -> cubin. I’ve never worried about this before so forgive me if I misspoke.

I asked internally, but maybe I didn’t word my question well.

I will also look at cuModuleLoad() and cuModuleLoadDataEx().