question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

JIT compile fail when using `functional.generalized_rspmm` with CUDA on Linux

See original GitHub issue

Hey,

Most likely this is an error with torch drug itself however when I try to run any of the examples from the readme, the code will crash with the following error:

spmm.cuda.o.d -DTORCH_EXTENSION_NAME=spmm -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /home/user/miniconda3/envs/path/lib/python3.8/site-packages/torch/include -isystem /home/user/miniconda3/envs/path/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /home/miniconda3/envs/path/lib/python3.8/site-packages/torch/include/TH -isystem /home/user/miniconda3/envs/path/lib/python3.8/site-packages/torch/include/THC -isystem /opt/scp/software/CUDA/11.1.0/include -isystem /home/miniconda3/envs/path/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_70,code=compute_70 -gencode=arch=compute_70,code=sm_70 --compiler-options '-fPIC' -O3 -std=c++14 -c /home/user/miniconda3/envs/path/lib/python3.8/site-packages/torchdrug/layers/functional/extension/spmm.cu -o spmm.cuda.o
FAILED: spmm.cuda.o
/opt/scp/software/CUDA/11.1.0/bin/nvcc --generate-dependencies-with-compile --dependency-output spmm.cuda.o.d -DTORCH_EXTENSION_NAME=spmm -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /home/user/miniconda3/envs/path/lib/python3.8/site-packages/torch/include -isystem /home/user/miniconda3/envs/path/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /home/user/miniconda3/envs/path/lib/python3.8/site-packages/torch/include/TH -isystem /home/user/miniconda3/envs/path/lib/python3.8/site-packages/torch/include/THC -isystem /opt/scp/software/CUDA/11.1.0/include -isystem /home/user/miniconda3/envs/path/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_70,code=compute_70 -gencode=arch=compute_70,code=sm_70 --compiler-options '-fPIC' -O3 -std=c++14 -c /home/user/miniconda3/envs/path/lib/python3.8/site-packages/torchdrug/layers/functional/extension/spmm.cu -o spmm.cuda.o
/opt/software/CUDA/11.1.0/include/cuComplex.h: In function ‘float cuCabsf(cuFloatComplex)’:
/opt/software/CUDA/11.1.0/include/cuComplex.h:179:16: error: expected ‘)’ before numeric constant

This only occurs on a GPU linux machine, which is using CUDA 11.1 and GCC 10.3.

The conda env is as follows:

blas                      1.0                         mkl
boost                     1.74.0           py38hc10631b_3    conda-forge
boost-cpp                 1.74.0               h9359b55_0    conda-forge
brotlipy                  0.7.0           py38h497a2fe_1001    conda-forge
bzip2                     1.0.8                h7f98852_4    conda-forge
ca-certificates           2021.10.8            ha878542_0    conda-forge
cairo                     1.16.0            h3fc0475_1005    conda-forge
certifi                   2021.10.8        py38h578d9bd_1    conda-forge
cffi                      1.15.0           py38hd667e15_1
charset-normalizer        2.0.10             pyhd8ed1ab_0    conda-forge
colorama                  0.4.4              pyh9f0ad1d_0    conda-forge
cryptography              35.0.0           py38ha5dfef3_0    conda-forge
cudatoolkit               11.1.1               h6406543_8    conda-forge
cycler                    0.11.0             pyhd8ed1ab_0    conda-forge
decorator                 4.4.2                      py_0    conda-forge
easydict                  1.9                        py_0    conda-forge
fontconfig                2.13.1            hba837de_1005    conda-forge
freetype                  2.10.4               h0708190_1    conda-forge
glib                      2.69.1               h4ff587b_1
icu                       67.1                 he1b5a44_0    conda-forge
idna                      3.3                pyhd8ed1ab_0    conda-forge
intel-openmp              2021.4.0          h06a4308_3561
jinja2                    3.0.3              pyhd8ed1ab_0    conda-forge
joblib                    1.1.0              pyhd8ed1ab_0    conda-forge
jpeg                      9d                   h36c2ea0_0    conda-forge
kiwisolver                1.3.1            py38h2531618_0
ld_impl_linux-64          2.35.1               h7274673_9
libffi                    3.3                  he6710b0_2
libgcc-ng                 9.3.0               h5101ec6_17
libgfortran-ng            7.5.0               h14aa051_19    conda-forge
libgfortran4              7.5.0               h14aa051_19    conda-forge
libgomp                   9.3.0               h5101ec6_17
libiconv                  1.16                 h516909a_0    conda-forge
libpng                    1.6.37               h21135ba_2    conda-forge
libstdcxx-ng              9.3.0               hd4cf53a_17
libtiff                   4.0.10            hc3755c2_1005    conda-forge
libuuid                   2.32.1            h7f98852_1000    conda-forge
libuv                     1.42.0               h7f98852_0    conda-forge
libxcb                    1.13              h7f98852_1003    conda-forge
libxml2                   2.9.10               h68273f3_2    conda-forge
littleutils               0.2.2                      py_0    conda-forge
lz4-c                     1.9.3                h9c3ff4c_1    conda-forge
markupsafe                2.0.1            py38h497a2fe_0    conda-forge
matplotlib                3.2.2                         1    conda-forge
matplotlib-base           3.2.2            py38h5d868c9_1    conda-forge
mkl                       2021.4.0           h06a4308_640
mkl-service               2.4.0            py38h497a2fe_0    conda-forge
mkl_fft                   1.3.1            py38hd3c417c_0
mkl_random                1.2.2            py38h1abd341_0    conda-forge
ncurses                   6.3                  h7f8727e_2
networkx                  2.5.1              pyhd8ed1ab_0    conda-forge
ninja                     1.10.2               h4bd325d_0    conda-forge
numpy                     1.21.2           py38h20f2e39_0
numpy-base                1.21.2           py38h79a1101_0
ogb                       1.3.2              pyhd8ed1ab_0    conda-forge
olefile                   0.46               pyh9f0ad1d_1    conda-forge
openssl                   1.1.1m               h7f8727e_0
outdated                  0.2.1              pyhd8ed1ab_0    conda-forge
pandas                    1.2.5            py38h1abd341_0    conda-forge
pcre                      8.45                 h9c3ff4c_0    conda-forge
pillow                    6.2.1            py38h6b7be26_0    conda-forge
pip                       21.2.4           py38h06a4308_0
pixman                    0.38.0            h516909a_1003    conda-forge
pthread-stubs             0.4               h36c2ea0_1001    conda-forge
pycairo                   1.20.1           py38hf61ee4a_0    conda-forge
pycparser                 2.21               pyhd8ed1ab_0    conda-forge
pyopenssl                 21.0.0             pyhd8ed1ab_0    conda-forge
pyparsing                 3.0.7              pyhd8ed1ab_0    conda-forge
pysocks                   1.7.1            py38h578d9bd_4    conda-forge
python                    3.8.12               h12debd9_0
python-dateutil           2.8.2              pyhd8ed1ab_0    conda-forge
python_abi                3.8                      2_cp38    conda-forge
pytorch                   1.8.2           py3.8_cuda11.1_cudnn8.0.5_0    pytorch-lts
pytorch-scatter           2.0.8           py38_torch_1.8.0_cu111    pyg
pytz                      2021.3             pyhd8ed1ab_0    conda-forge
pyyaml                    5.4.1            py38h497a2fe_0    conda-forge
rdkit                     2020.09.5        py38h2bca085_0    conda-forge
readline                  8.1.2                h7f8727e_1
reportlab                 3.5.68           py38hadf75a6_0    conda-forge
requests                  2.27.1             pyhd8ed1ab_0    conda-forge
scikit-learn              1.0.2            py38h51133e4_1
scipy                     1.7.3            py38hc147768_0
setuptools                58.0.4           py38h06a4308_0
six                       1.16.0             pyh6c4a22f_0    conda-forge
sqlalchemy                1.3.23           py38h497a2fe_0    conda-forge
sqlite                    3.37.0               hc218d9a_0
threadpoolctl             3.0.0              pyh8a188c0_0    conda-forge
tk                        8.6.11               h1ccaba5_0
torchdrug                 0.1.2                  ha710097    milagraph
tornado                   6.1              py38h497a2fe_1    conda-forge
tqdm                      4.62.3             pyhd8ed1ab_0    conda-forge
typing_extensions         4.0.1              pyha770c72_0    conda-forge
urllib3                   1.26.8             pyhd8ed1ab_1    conda-forge
wheel                     0.37.1             pyhd3eb1b0_0
xorg-kbproto              1.0.7             h7f98852_1002    conda-forge
xorg-libice               1.0.10               h7f98852_0    conda-forge
xorg-libsm                1.2.3             hd9c2040_1000    conda-forge
xorg-libx11               1.7.2                h7f98852_0    conda-forge
xorg-libxau               1.0.9                h7f98852_0    conda-forge
xorg-libxdmcp             1.1.3                h7f98852_0    conda-forge
xorg-libxext              1.3.4                h7f98852_1    conda-forge
xorg-libxrender           0.9.10            h7f98852_1003    conda-forge
xorg-renderproto          0.11.1            h7f98852_1002    conda-forge
xorg-xextproto            7.3.0             h7f98852_1002    conda-forge
xorg-xproto               7.0.31            h7f98852_1007    conda-forge
xz                        5.2.5                h7b6447c_0
yaml                      0.2.5                h516909a_0    conda-forge
zlib                      1.2.11               h7f8727e_4
zstd                      1.4.9                ha95c52a_0    conda-forge

Any ideas how to get this to run?

Many thanks!

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:5 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
KiddoZhucommented, Jan 26, 2022

By the interface of TorchDrug, it is hard to dump checkpoint at the half of a epoch. But you can override the length of an epoch with the argument batch_per_epoch in solver.train. That can achieve a similar effect.

0reactions
sbonner0commented, Jan 25, 2022

Hey @KiddoZhu I managed to solve this and it seemed to be some weird mismatch between conda and the native CUDA libraries installed on the HPC system. I now have it running using the python provided on the HPC and everything is working.

Thanks for your quick reply! I will now close the issue but I had a quick question about model check pointing – are checkpoints only saved at the end of each epoch? If so, can this be changed to save after n update steps?

Read more comments on GitHub >

github_iconTop Results From Across the Web

PTX JIT compiler failed · Issue #314 · NVIDIA/apex - GitHub
I installed as such: pytorch 1.1. torchvison 2.1 (both from conda -c python). built apex. And now I'm getting this error: File ...
Read more >
CUDA Compiler Driver NVCC - NVIDIA Documentation Center
The CUDA compilation trajectory separates the device functions from the host code, compiles the device functions using the proprietary NVIDIA compilers and ...
Read more >
Troubleshooting and tips - Numba
A common reason for Numba failing to compile (especially in nopython mode) ... a compiled JIT function is that compiling in nopython mode...
Read more >
5 minute guide to Numba
Should the compilation in nopython mode fail, Numba can compile using object mode . This is a fall back mode for the @jit...
Read more >
Compiling CUDA code fails - Using Clang - LLVM Discourse
Workaround ideas: Modify the code to pass a T() and implement __nv_is_extended_device_lambda_closure_type as consteval function in user space. Teach clang about ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found