question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Build failure from source with USE_FP16=ON with CUDA10.2 and Volta architecture

See original GitHub issue

🐛 Bug

Hello, I’m trying to build dgl with fp16 support using the master branch. I used the following cmake flags

cmake -DUSE_CUDA=ON -DCUDA_ARCH_NAME=Manual -DCUDA_ARCH_BIN="70 75" -DUSE_AVX=OFF -DBUILD_TORCH=ON -DUSE_FP16=ON  ..

and when I do make, it succeeds in compiling tensoradapter for torch and all cpu kernels and starts printing bunch of these errors and fails

/ccs/home/skrsna/dgl/src/array/cuda/./sddmm.cuh(113): error: more than one conversion function from "const half" to a built-in type applies:
            function "__half::operator float() const"
            function "__half::operator short() const"
            function "__half::operator unsigned short() const"
            function "__half::operator int() const"
            function "__half::operator unsigned int() const"
            function "__half::operator long long() const"
            function "__half::operator unsigned long long() const"
            function "__half::operator __nv_bool() const"
          detected during:
            instantiation of "void dgl::aten::cuda::SDDMMCooTreeReduceKernel(const DType *, const DType *, DType *, const Idx *, const Idx *, const Idx *, int64_t, int64_t, int64_t, int64_t, const int64_t *, const int64_t *, int64_t, int64_t, int64_t) [with Idx=int32_t, DType=half, UseBcast=false, UseIdx=false, LhsTarget=1, RhsTarget=1]" 
(235): here
            instantiation of "void dgl::aten::cuda::SDDMMCoo<Idx,DType,Op,LhsTarget,RhsTarget>(const dgl::BcastOff &, const dgl::aten::COOMatrix &, dgl::runtime::NDArray, dgl::runtime::NDArray, dgl::runtime::NDArray) [with Idx=int32_t, DType=half, Op=dgl::aten::cuda::binary::Add<half>, LhsTarget=1, RhsTarget=1]" 
/ccs/home/skrsna/dgl/src/array/cuda/sddmm.cu(106): here
            instantiation of "void dgl::aten::SDDMMCoo<XPU,IdType,bits>(const std::__cxx11::string &, const dgl::BcastOff &, const dgl::aten::COOMatrix &, dgl::runtime::NDArray, dgl::runtime::NDArray, dgl::runtime::NDArray, int, int) [with XPU=2, IdType=int32_t, bits=16]" 
/ccs/home/skrsna/dgl/src/array/cuda/sddmm.cu(140): here

Error limit reached.
100 errors detected in the compilation of "/tmp/tmpxft_000163b3_00000000-8_sddmm.compute_50.cpp1.ii".
Compilation terminated.
CMake Error at dgl_generated_sddmm.cu.o.cmake:276 (message):
  Error generating file
  /ccs/home/skrsna/dgl/build/CMakeFiles/dgl.dir/src/array/cuda/./dgl_generated_sddmm.cu.o


make[2]: *** [CMakeFiles/dgl.dir/build.make:4669: CMakeFiles/dgl.dir/src/array/cuda/dgl_generated_sddmm.cu.o] Error 1

make[1]: *** [CMakeFiles/Makefile2:166: CMakeFiles/dgl.dir/all] Error 2
make: *** [Makefile:149: all] Error 2

I tried this with gcc versions 7.4.0/8.1.0 and 8.1.1 to no avail. @romerojosh also reported similar compilation errors on DGX V100 machine (correct me if I’m wrong). Any help with this? Thanks 🙂

Expected behavior

Environment

  • DGL Version (e.g., 1.0):
    • 0.6 (from master with this hash db57809da147c663a8369c554986ca9d0b19f0ea)
  • Backend Library & Version (e.g., PyTorch 0.4.1, MXNet/Gluon 1.3):
    • pytorch 1.7.1
  • OS (e.g., Linux):
    • RHEL 7.6
  • How you installed DGL (conda, pip, source):
    • source
  • Build command you used (if compiling from source):
    • cmake -DUSE_CUDA=ON -DCUDA_ARCH_NAME=Manual -DCUDA_ARCH_BIN="70 75" -DUSE_AVX=OFF -DBUILD_TORCH=ON -DUSE_FP16=ON ..
  • Python version:
    • 3.8
  • CUDA/cuDNN version (if applicable):
    • 10.2.89/7.6.5_10.2
  • GPU models and configuration (e.g. V100):
    • V100
  • Any other relevant information:

Additional context

  • cmake output
-- The C compiler identification is GNU 8.1.1
-- The CXX compiler identification is GNU 8.1.1
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /sw/summit/gcc/8.1.1/bin/gcc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /sw/summit/gcc/8.1.1/bin/g++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Start configuring project dgl
-- Build with CUDA support
-- Looking for pthread.h
-- Looking for pthread.h - found
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Failed
-- Looking for pthread_create in pthreads
-- Looking for pthread_create in pthreads - not found
-- Looking for pthread_create in pthread
-- Looking for pthread_create in pthread - found
-- Found Threads: TRUE  
-- Found CUDA: /sw/summit/cuda/10.2.89 (found version "10.2") 
-- Found CUDA_TOOLKIT_ROOT_DIR=/sw/summit/cuda/10.2.89
-- Found CUDA_CUDART_LIBRARY=/sw/summit/cuda/10.2.89/lib64/libcudart.so
-- Found CUDA_CUBLAS_LIBRARY=/ccs/home/skrsna/.conda/envs/builder/lib/libcublas.so
-- Performing Test SUPPORT_CXX14
-- Performing Test SUPPORT_CXX14 - Success
-- Detected CUDA of version 10.2. Use external CUB/Thrust library.
-- Performing Test SUPPORT_CXX11
-- Performing Test SUPPORT_CXX11 - Success
-- Found OpenMP_C: -fopenmp (found version "4.5") 
-- Found OpenMP_CXX: -fopenmp (found version "4.5") 
-- Found OpenMP: TRUE (found version "4.5")  
-- Build with OpenMP.
-- Build with fp16 to support mixed precision training
-- -fopenmp -O2 -Wall -fPIC -std=c++11  -DUSE_FP16 -DIDXTYPEWIDTH=64 -DREALTYPEWIDTH=32
-- CUDA flags: -Xcompiler ,-fopenmp,-O2,-Wall,-fPIC,,,-DUSE_FP16,-DIDXTYPEWIDTH=64,-DREALTYPEWIDTH=32;--expt-relaxed-constexpr;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_50,code=compute_50;--expt-extended-lambda;-Wno-deprecated-declarations;-std=c++14
-- Found OpenMP_C: -fopenmp (found version "4.5") 
-- Found OpenMP_CXX: -fopenmp (found version "4.5") 
-- Looking for clock_gettime in rt
-- Looking for clock_gettime in rt - found
-- Looking for fopen64
-- Looking for fopen64 - not found
-- Looking for C++ include cxxabi.h
-- Looking for C++ include cxxabi.h - found
-- Looking for nanosleep
-- Looking for nanosleep - found
-- Looking for backtrace
-- Looking for backtrace - found
-- backtrace facility detected in default set of libraries
-- Found Backtrace: /usr/include  
-- Check if the system is big endian
-- Searching 16 bit integer
-- Looking for sys/types.h
-- Looking for sys/types.h - found
-- Looking for stdint.h
-- Looking for stdint.h - found
-- Looking for stddef.h
-- Looking for stddef.h - found
-- Check size of unsigned short
-- Check size of unsigned short - done
-- Searching 16 bit integer - Using unsigned short
-- Check if the system is big endian - little endian
-- /ccs/home/skrsna/dgl/third_party/dmlc-core/cmake/build_config.h.in -> include/dmlc/build_config.h
-- Looking for execinfo.h
-- Looking for execinfo.h - found
-- Looking for getline
-- Looking for getline - found
-- Configuring done
-- Generating done
-- Build files have been written to: /ccs/home/skrsna/dgl/build

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Reactions:1
  • Comments:9 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
zhaonecommented, Mar 10, 2021

Hi, must I compile dgl from source code if I want to use mixed precision feature? Can I just use pip? I tried pip install dgl-cu101==0.6.0? However it did not work, I got:

dgl._ffi.base.DGLError: [12:55:52] /opt/dgl/src/array/cuda/sddmm.cu:112: Data type not renogized with bits 16

I also tried to build dgl using cmake -DUSE_CUDA=ON -DUSE_FP16=ON .., but failed, too. The err is:

CMake Error: The following variables are used in this project, but they are set to NOTFOUND.
Please set them or make sure they are set and tested correctly in the CMake files:
CUDA_cublas_device_LIBRARY (ADVANCED)
    linked by target "dgl" in directory /home/zhaoyi/dgl

My env:

OS: Linux 0f7fe7279276 4.9.0-14-amd64 #1 SMP Debian 4.9.246-2 (2020-12-17) x86_64 x86_64 x86_64 GNU/Linux
cmake version: 3.10.2
Cuda version: 10.1
torch.__version__: '1.6.0+cu101'
0reactions
HoytWencommented, Sep 20, 2021

Hi @yzh119, I failed to compile the dgl source files on the same step as @skrsna. Although the cmake output is a different.

cmake output

-- The C compiler identification is GNU 7.5.0
-- The CXX compiler identification is GNU 7.5.0
-- Check for working C compiler: /usr/bin/cc
-- Check for working C compiler: /usr/bin/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Start configuring project dgl
-- Build with CUDA support
-- Looking for pthread.h
-- Looking for pthread.h - found
-- Looking for pthread_create
-- Looking for pthread_create - not found
-- Looking for pthread_create in pthreads
-- Looking for pthread_create in pthreads - not found
-- Looking for pthread_create in pthread
-- Looking for pthread_create in pthread - found
-- Found Threads: TRUE  
-- Found CUDA: /usr/local/cuda-11.1 (found version "11.1") 
-- Found CUDA_TOOLKIT_ROOT_DIR=/usr/local/cuda-11.1
-- Found CUDA_CUDART_LIBRARY=/usr/local/cuda-11.1/lib64/libcudart.so
-- Found CUDA_CUBLAS_LIBRARY=/usr/local/cuda-11.1/lib64/libcublas.so
-- Found CUDA_CURAND_LIBRARY=/usr/local/cuda-11.1/lib64/libcurand.so
-- Performing Test SUPPORT_CXX14
-- Performing Test SUPPORT_CXX14 - Success
-- Performing Test SUPPORT_CXX11
-- Performing Test SUPPORT_CXX11 - Success
-- Found OpenMP_C: -fopenmp (found version "4.5") 
-- Found OpenMP_CXX: -fopenmp (found version "4.5") 
-- Found OpenMP: TRUE (found version "4.5")  
-- Build with OpenMP.
-- Build with LIBXSMM optimization.
-- Build with fp16 to support mixed precision training
-- -fopenmp -O2 -Wall -fPIC -std=c++11  -DUSE_AVX -DUSE_LIBXSMM -DDGL_CPU_LLC_SIZE=40000000 -DUSE_FP16 -DIDXTYPEWIDTH=64 -DREALTYPEWIDTH=32
-- Running GPU architecture autodetection
nvcc warning : The 'compute_35', 'compute_37', 'compute_50', 'sm_35', 'sm_37' and 'sm_50' architectures are deprecated, and may be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
-- Found CUDA arch 7.5 7.5
-- CUDA flags: -Xcompiler ,-fopenmp,-O2,-Wall,-fPIC,,,-DUSE_AVX,-DUSE_LIBXSMM,-DDGL_CPU_LLC_SIZE=40000000,-DUSE_FP16,-DIDXTYPEWIDTH=64,-DREALTYPEWIDTH=32;--expt-relaxed-constexpr;-gencode;arch=compute_75,code=sm_75;--expt-extended-lambda;-Wno-deprecated-declarations;-std=c++14
-- Found OpenMP_C: -fopenmp (found version "4.5") 
-- Found OpenMP_CXX: -fopenmp (found version "4.5") 
-- Looking for clock_gettime in rt
-- Looking for clock_gettime in rt - found
-- Looking for fopen64
-- Looking for fopen64 - not found
-- Looking for C++ include cxxabi.h
-- Looking for C++ include cxxabi.h - found
-- Looking for nanosleep
-- Looking for nanosleep - found
-- Looking for backtrace
-- Looking for backtrace - found
-- backtrace facility detected in default set of libraries
-- Found Backtrace: /usr/include  
-- Check if the system is big endian
-- Searching 16 bit integer
-- Looking for sys/types.h
-- Looking for sys/types.h - found
-- Looking for stdint.h
-- Looking for stdint.h - found
-- Looking for stddef.h
-- Looking for stddef.h - found
-- Check size of unsigned short
-- Check size of unsigned short - done
-- Using unsigned short
-- Check if the system is big endian - little endian
-- /home/zja/dgl/third_party/dmlc-core/cmake/build_config.h.in -> include/dmlc/build_config.h
-- Performing Test SUPPORT_MSSE2
-- Performing Test SUPPORT_MSSE2 - Success
-- Looking for execinfo.h
-- Looking for execinfo.h - found
-- Looking for getline
-- Looking for getline - found
CMake Error: The following variables are used in this project, but they are set to NOTFOUND.
Please set them or make sure they are set and tested correctly in the CMake files:
CUDA_cublas_device_LIBRARY (ADVANCED)
    linked by target "dgl" in directory /home/zja/dgl

-- Configuring incomplete, errors occurred!

Could please specify which right command I should use? Thanks! And some of the environment information is as follows:

Environment information

  • python version: 3.8
  • backend library version: torch==1.9.0+cu111
  • OS: Ubuntu 18.04.5
  • CUDA version: 11.2
  • GPU: RTX 2080ti
  • build command:
cmake -DUSE_CUDA=ON -DUSE_FP16=ON ..
Read more comments on GitHub >

github_iconTop Results From Across the Web

Volta Compatibility Guide :: CUDA Toolkit Documentation
Volta Compatibility Guide for CUDA Applications. The guide to building CUDA applications for GPUs based on the NVIDIA Volta Architecture.
Read more >
NVIDIA GeForce RTX 3090 with CUDA capability sm_86 is not ...
Hello, I'm getting following error: NVIDIA GeForce RTX 3090 with CUDA capability sm_86 is not compatible with the current PyTorch installation.
Read more >
CUDA - Wikipedia
CUDA (or Compute Unified Device Architecture) is a parallel computing platform and application programming interface (API) that allows software to use ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found