Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Compilation fails on Power architecture

See original GitHub issue

🐛 Bug

When compiling on an IBM AC922 GPU server, the compilation fails. This is due to the fact that apparently, Intel registers are being used (https://github.com/dmlc/dgl/blob/master/include/intel/cpu_support.h). However, I am not sure whether these are custom optimizations, or necessary for compilation.

If this Assembly code is needed, the compilation process should check that we are on x86 beforehand and not fail like this:

/hpi/fs00/home/maximilian.boether/dgl/dgl/include/intel/cpu_support.h:70:16: error: ‘Reg64’ in namespace ‘Xbyak’ does not name a type
   70 |   const Xbyak::Reg64 &r_out_;
      |                ^~~~~
/hpi/fs00/home/maximilian.boether/dgl/dgl/include/intel/cpu_support.h:71:16: error: ‘Reg64’ in namespace ‘Xbyak’ does not name a type
   71 |   const Xbyak::Reg64 &r_left_;
      |                ^~~~~
/hpi/fs00/home/maximilian.boether/dgl/dgl/include/intel/cpu_support.h:72:16: error: ‘Reg64’ in namespace ‘Xbyak’ does not name a type
   72 |   const Xbyak::Reg64 &r_right;
      |                ^~~~~
/hpi/fs00/home/maximilian.boether/dgl/dgl/include/intel/cpu_support.h:73:16: error: ‘Reg64’ in namespace ‘Xbyak’ does not name a type
   73 |   const Xbyak::Reg64 &r_size_;
      |                ^~~~~

... and many more errors

And it should be added to the documentation (although I would love Power support, I am not sure if it is realistic in the near future). If the Assembly code is not needed, I would be glad for help on how to compile on Power.

To Reproduce

Steps to reproduce the behavior:

Compile dgl from source on an IBM Power machine

Expected behavior

Clear documentation on supported platforms / platform check beforehand

Environment

DGL Version (e.g., 1.0): Masterbranch
Backend Library & Version (e.g., PyTorch 0.4.1, MXNet/Gluon 1.3): Pytorch 1.5
OS (e.g., Linux): Linux
How you installed DGL (conda, pip, source): source
Build command you used (if compiling from source): according to documentation
Python version: 3.7
CUDA/cuDNN version (if applicable): 10.2
GPU models and configuration (e.g. V100): NVIDIA Tesla V100-SXM2
Any other relevant information:

Issue Analytics

State:
Created 2 years ago
Comments:5 (1 by maintainers)

Top GitHub Comments

1reaction

BarclayIIcommented, May 31, 2021

Does the toolchain contain its own libm? Maybe the toolchain is linking to another version of libc and libm but when you run the program it finds the system libc and libm instead. You may want to check the linker’s search path.

If so, you can add the path containing libc and libm to LD_LIBRARY_PATH to make Linux load the libraries from those versions instead.

1reaction

MaxiBoethercommented, May 17, 2021

Hey,

after fixing the CMake configuration of METIS, which uses -march which is not supported on power, it compiled. However, after running python setup.py install, I cannot import DGL:

(dgl) [maximilian.boether@ac922-02 python]$ python
Python 3.7.7 (default, Mar 26 2020, 15:05:49)
[GCC 7.3.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import dgl
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/hpi/fs00/home/maximilian.boether/dgl/dgl/python/dgl/__init__.py", line 13, in <module>
    from .backend import load_backend, backend_name
  File "/hpi/fs00/home/maximilian.boether/dgl/dgl/python/dgl/backend/__init__.py", line 95, in <module>
    load_backend(get_preferred_backend())
  File "/hpi/fs00/home/maximilian.boether/dgl/dgl/python/dgl/backend/__init__.py", line 41, in load_backend
    from .._ffi.base import load_tensor_adapter # imports DGL C library
  File "/hpi/fs00/home/maximilian.boether/dgl/dgl/python/dgl/_ffi/base.py", line 44, in <module>
    _LIB, _LIB_NAME, _DIR_NAME = _load_lib()
  File "/hpi/fs00/home/maximilian.boether/dgl/dgl/python/dgl/_ffi/base.py", line 34, in _load_lib
    lib = ctypes.CDLL(lib_path[0])
  File "/hpi/fs00/home/maximilian.boether/anaconda3/envs/dgl/lib/python3.7/ctypes/__init__.py", line 364, in __init__
    self._handle = _dlopen(self._name, mode)
OSError: /lib64/power9/libm.so.6: version `GLIBC_2.29' not found (required by /hpi/fs00/home/maximilian.boether/dgl/dgl/build/libdgl.so)

It seems to have something to do with the glibc. On Power, the IBM Advanced Toolchain for Linux on Power is used, which is loaded at the beginning using module load at14.0. I think the issue lies somewhere in the interplay of the IBM toolchain and the compilation process… I am trying to figure out more, but maybe you have an idea. Thanks already for the help!