Compilation fails on Power architecture
See original GitHub issue🐛 Bug
When compiling on an IBM AC922 GPU server, the compilation fails. This is due to the fact that apparently, Intel registers are being used (https://github.com/dmlc/dgl/blob/master/include/intel/cpu_support.h). However, I am not sure whether these are custom optimizations, or necessary for compilation.
If this Assembly code is needed, the compilation process should check that we are on x86 beforehand and not fail like this:
/hpi/fs00/home/maximilian.boether/dgl/dgl/include/intel/cpu_support.h:70:16: error: ‘Reg64’ in namespace ‘Xbyak’ does not name a type
70 | const Xbyak::Reg64 &r_out_;
| ^~~~~
/hpi/fs00/home/maximilian.boether/dgl/dgl/include/intel/cpu_support.h:71:16: error: ‘Reg64’ in namespace ‘Xbyak’ does not name a type
71 | const Xbyak::Reg64 &r_left_;
| ^~~~~
/hpi/fs00/home/maximilian.boether/dgl/dgl/include/intel/cpu_support.h:72:16: error: ‘Reg64’ in namespace ‘Xbyak’ does not name a type
72 | const Xbyak::Reg64 &r_right;
| ^~~~~
/hpi/fs00/home/maximilian.boether/dgl/dgl/include/intel/cpu_support.h:73:16: error: ‘Reg64’ in namespace ‘Xbyak’ does not name a type
73 | const Xbyak::Reg64 &r_size_;
| ^~~~~
... and many more errors
And it should be added to the documentation (although I would love Power support, I am not sure if it is realistic in the near future). If the Assembly code is not needed, I would be glad for help on how to compile on Power.
To Reproduce
Steps to reproduce the behavior:
Compile dgl from source on an IBM Power machine
Expected behavior
Clear documentation on supported platforms / platform check beforehand
Environment
- DGL Version (e.g., 1.0): Masterbranch
- Backend Library & Version (e.g., PyTorch 0.4.1, MXNet/Gluon 1.3): Pytorch 1.5
- OS (e.g., Linux): Linux
- How you installed DGL (
conda
,pip
, source): source - Build command you used (if compiling from source): according to documentation
- Python version: 3.7
- CUDA/cuDNN version (if applicable): 10.2
- GPU models and configuration (e.g. V100): NVIDIA Tesla V100-SXM2
- Any other relevant information:
Issue Analytics
- State:
- Created 2 years ago
- Comments:5 (1 by maintainers)
Does the toolchain contain its own libm? Maybe the toolchain is linking to another version of libc and libm but when you run the program it finds the system libc and libm instead. You may want to check the linker’s search path.
If so, you can add the path containing libc and libm to
LD_LIBRARY_PATH
to make Linux load the libraries from those versions instead.Hey,
after fixing the CMake configuration of METIS, which uses
-march
which is not supported on power, it compiled. However, after runningpython setup.py install
, I cannot import DGL:It seems to have something to do with the glibc. On Power, the IBM Advanced Toolchain for Linux on Power is used, which is loaded at the beginning using
module load at14.0
. I think the issue lies somewhere in the interplay of the IBM toolchain and the compilation process… I am trying to figure out more, but maybe you have an idea. Thanks already for the help!