Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Add FP16 support for GatherMM kernel

See original GitHub issue

🐛 Bug

I am trying to compile the library with FP16 support as per the documentation in https://docs.dgl.ai/en/0.6.x/guide/mixed_precision.html However, when cuda starts to compile, I get missing include issues.

To Reproduce

Steps to reproduce the behavior:

git clone --recurse-submodules https://github.com/dmlc/dgl.git cd dgl mkdir build cd build cmake -DUSE_CUDA=ON -DUSE_FP16=ON … make -j

That by itself gives an error of an issue in an if at the beginning to detect if its above cuda11 on file: make/modules/CUDA.cmake I circunnavegated that by removing the if (I am sure that my gpus suport that arch), but that needs to be fixed

Then later on in compilation when doing make -j I get a missing include of dgl/array.h etc…

I fixed this by adding in:

macro(dgl_config_cuda out_variable)
  if(NOT CUDA_FOUND)
    message(FATAL_ERROR "Cannot find CUDA.")
  endif()
  \# always set the includedir when cuda is available
  \# avoid global retrigger of cmake
        include_directories(${CUDA_INCLUDE_DIRS})

        **include_directories("include")
        include_directories("third_party/dlpack/include")
        include_directories("third_party/dmlc-core/include")
        include_directories("third_party/phmap/")
        include_directories("third_party/xbyak/")
        include_directories("third_party/METIS/include/")
        include_directories("tensoradapter/include")
        include_directories("third_party/nanoflann/include")
        include_directories("third_party/libxsmm/include")**

Expected behavior

All in all, with the compile using fp16, it should just work…

Environment

DGL Version 0.8:
Backend Library & Version (e.g., PyTorch 0.4.1, MXNet/Gluon 1.3): Pytorch 1.11
OS (e.g., Linux): Ubuntu 16.04
How you installed DGL (conda, pip, source): source
Build command you used (if compiling from source): cmake -DUSE_CUDA=ON -DUSE_FP16=ON … && make -j
Python version: 3.9
CUDA/cuDNN version (if applicable): 11.3
GPU models and configuration (e.g. V100): A6000
Any other relevant information:

I think something happened in the cmake file that the include directories are never passed on correctly into the cuda compilation.

Additional context

Issue Analytics

State:
Created a year ago
Comments:8 (4 by maintainers)

Top GitHub Comments

2reactions

yzh119commented, May 11, 2022

Hmmm it seems gather-mm didn’t handle fp16: https://github.com/dmlc/dgl/blob/0227ddfb66421164834879619ff7fd8a5c6f8960/src/array/cuda/gather_mm.cu#L16-L49

@isratnisa @jermainewang would you mind adding fp16 support for that?

1reaction

ndickson-nvidiacommented, Jun 7, 2022

This should be fixed, (i.e. the cublasGemm specialization for half is now present), now that PR #4029 is merged in. There are several other similarly missing specializations that I’ll try adding in a separate PR.

Sorry that it took longer than expected. Even though it was easy to add the specialization itself, there were a couple complicating factors that prevented it from running, and then upon fixing that, several complicating factors that prevented it from compiling for some platforms, but those should be sorted out, now. 🙂

Top Results From Across the Web

arm64: Add support for Half precision floating point - Patchwork

This patch adds support for detecting and exposing the same to the userspace ... uarch specific tuning decisions, HWCAP is for arch extensions...

An Introduction to Writing FP16 code for NVIDIA's GPUs

The first hiccup in writing FP16 kernels is writing the host code and - for that we have 2 options options to create...

Mixed-Precision Programming with CUDA 8 - NVIDIA Developer

cuDNN 5.0 includes FP16 support for forward convolution, and 5.1 added support for FP16 backward convolution. All other routines in the library ...

Alan Lawrence - [PATCH 5/14][AArch64] Add basic fp16 support

This adds basic support for moving __fp16 values around, passing and returning, and operating on them by promoting to 32-bit floats. Also a...

Chapter 8: Mixed Precision Training - DGL Docs

Message-Passing with Half Precision¶. DGL with fp16 support allows message-passing on float16 features for both UDF(User Defined Function)s and built-in ...