question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Add FP16 support for GatherMM kernel

See original GitHub issue

🐛 Bug

I am trying to compile the library with FP16 support as per the documentation in https://docs.dgl.ai/en/0.6.x/guide/mixed_precision.html However, when cuda starts to compile, I get missing include issues.

To Reproduce

Steps to reproduce the behavior:

git clone --recurse-submodules https://github.com/dmlc/dgl.git cd dgl mkdir build cd build cmake -DUSE_CUDA=ON -DUSE_FP16=ON … make -j

That by itself gives an error of an issue in an if at the beginning to detect if its above cuda11 on file: make/modules/CUDA.cmake I circunnavegated that by removing the if (I am sure that my gpus suport that arch), but that needs to be fixed

Then later on in compilation when doing make -j I get a missing include of dgl/array.h etc…

I fixed this by adding in:

macro(dgl_config_cuda out_variable)
  if(NOT CUDA_FOUND)
    message(FATAL_ERROR "Cannot find CUDA.")
  endif()
  \# always set the includedir when cuda is available
  \# avoid global retrigger of cmake
        include_directories(${CUDA_INCLUDE_DIRS})

        **include_directories("include")
        include_directories("third_party/dlpack/include")
        include_directories("third_party/dmlc-core/include")
        include_directories("third_party/phmap/")
        include_directories("third_party/xbyak/")
        include_directories("third_party/METIS/include/")
        include_directories("tensoradapter/include")
        include_directories("third_party/nanoflann/include")
        include_directories("third_party/libxsmm/include")**

Expected behavior

All in all, with the compile using fp16, it should just work…

Environment

  • DGL Version 0.8:
  • Backend Library & Version (e.g., PyTorch 0.4.1, MXNet/Gluon 1.3): Pytorch 1.11
  • OS (e.g., Linux): Ubuntu 16.04
  • How you installed DGL (conda, pip, source): source
  • Build command you used (if compiling from source): cmake -DUSE_CUDA=ON -DUSE_FP16=ON … && make -j
  • Python version: 3.9
  • CUDA/cuDNN version (if applicable): 11.3
  • GPU models and configuration (e.g. V100): A6000
  • Any other relevant information:

I think something happened in the cmake file that the include directories are never passed on correctly into the cuda compilation.

Additional context

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:8 (4 by maintainers)

github_iconTop GitHub Comments

2reactions
yzh119commented, May 11, 2022

Hmmm it seems gather-mm didn’t handle fp16: https://github.com/dmlc/dgl/blob/0227ddfb66421164834879619ff7fd8a5c6f8960/src/array/cuda/gather_mm.cu#L16-L49

@isratnisa @jermainewang would you mind adding fp16 support for that?

1reaction
ndickson-nvidiacommented, Jun 7, 2022

This should be fixed, (i.e. the cublasGemm specialization for half is now present), now that PR #4029 is merged in. There are several other similarly missing specializations that I’ll try adding in a separate PR.

Sorry that it took longer than expected. Even though it was easy to add the specialization itself, there were a couple complicating factors that prevented it from running, and then upon fixing that, several complicating factors that prevented it from compiling for some platforms, but those should be sorted out, now. 🙂

Read more comments on GitHub >

github_iconTop Results From Across the Web

arm64: Add support for Half precision floating point - Patchwork
This patch adds support for detecting and exposing the same to the userspace ... uarch specific tuning decisions, HWCAP is for arch extensions...
Read more >
An Introduction to Writing FP16 code for NVIDIA's GPUs
The first hiccup in writing FP16 kernels is writing the host code and - for that we have 2 options options to create...
Read more >
Mixed-Precision Programming with CUDA 8 - NVIDIA Developer
cuDNN 5.0 includes FP16 support for forward convolution, and 5.1 added support for FP16 backward convolution. All other routines in the library ...
Read more >
Alan Lawrence - [PATCH 5/14][AArch64] Add basic fp16 support
This adds basic support for moving __fp16 values around, passing and returning, and operating on them by promoting to 32-bit floats. Also a...
Read more >
Chapter 8: Mixed Precision Training - DGL Docs
Message-Passing with Half Precision¶. DGL with fp16 support allows message-passing on float16 features for both UDF(User Defined Function)s and built-in ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found