Add FP16 support for GatherMM kernel
See original GitHub issue🐛 Bug
I am trying to compile the library with FP16 support as per the documentation in https://docs.dgl.ai/en/0.6.x/guide/mixed_precision.html However, when cuda starts to compile, I get missing include issues.
To Reproduce
Steps to reproduce the behavior:
git clone --recurse-submodules https://github.com/dmlc/dgl.git cd dgl mkdir build cd build cmake -DUSE_CUDA=ON -DUSE_FP16=ON … make -j
That by itself gives an error of an issue in an if at the beginning to detect if its above cuda11 on file: make/modules/CUDA.cmake I circunnavegated that by removing the if (I am sure that my gpus suport that arch), but that needs to be fixed
Then later on in compilation when doing make -j I get a missing include of dgl/array.h etc…
I fixed this by adding in:
macro(dgl_config_cuda out_variable)
if(NOT CUDA_FOUND)
message(FATAL_ERROR "Cannot find CUDA.")
endif()
\# always set the includedir when cuda is available
\# avoid global retrigger of cmake
include_directories(${CUDA_INCLUDE_DIRS})
**include_directories("include")
include_directories("third_party/dlpack/include")
include_directories("third_party/dmlc-core/include")
include_directories("third_party/phmap/")
include_directories("third_party/xbyak/")
include_directories("third_party/METIS/include/")
include_directories("tensoradapter/include")
include_directories("third_party/nanoflann/include")
include_directories("third_party/libxsmm/include")**
Expected behavior
All in all, with the compile using fp16, it should just work…
Environment
- DGL Version 0.8:
- Backend Library & Version (e.g., PyTorch 0.4.1, MXNet/Gluon 1.3): Pytorch 1.11
- OS (e.g., Linux): Ubuntu 16.04
- How you installed DGL (
conda
,pip
, source): source - Build command you used (if compiling from source): cmake -DUSE_CUDA=ON -DUSE_FP16=ON … && make -j
- Python version: 3.9
- CUDA/cuDNN version (if applicable): 11.3
- GPU models and configuration (e.g. V100): A6000
- Any other relevant information:
I think something happened in the cmake file that the include directories are never passed on correctly into the cuda compilation.
Additional context
Issue Analytics
- State:
- Created a year ago
- Comments:8 (4 by maintainers)
Hmmm it seems gather-mm didn’t handle fp16: https://github.com/dmlc/dgl/blob/0227ddfb66421164834879619ff7fd8a5c6f8960/src/array/cuda/gather_mm.cu#L16-L49
@isratnisa @jermainewang would you mind adding fp16 support for that?
This should be fixed, (i.e. the
cublasGemm
specialization forhalf
is now present), now that PR #4029 is merged in. There are several other similarly missing specializations that I’ll try adding in a separate PR.Sorry that it took longer than expected. Even though it was easy to add the specialization itself, there were a couple complicating factors that prevented it from running, and then upon fixing that, several complicating factors that prevented it from compiling for some platforms, but those should be sorted out, now. 🙂