Successfully compiled and installed MMCV-FULL on gfx803, ROCM4.3.1, pytorch1.9.1 platform
See original GitHub issueDescribe the Issue A clear and concise description of what the bug is, including what results are expected and what the real results you got. When I was compiling mmcv-full, I ran into the following problem,
hipify()
is incomplete.The new .hip files generated under themmcv/ops/csrc/common/hip
andmmcv/ops/csrc/pytorch/hip
paths require manual changes to the requiredpytorch_xxx_helper.hpp
, and themmcv/ops/csrc/pytorch/hip
paths file also needs to change the name of the.cuh
, and something else. For example, afterhipify()
there is#include "ball_query_cuda_kernel.cuh"
in the fileball_query_hip.hip
, which should be#include "ball_query_hip_kernel.cuh"
.The error information will be like
FAILED: /root/Code/mmcv_issue/build/temp.linux-x86_64-3.8/root/Code/mmcv_issue/mmcv/ops/csrc/pytorch/hip/focal_loss_hip.o
/opt/rocm-4.3.0/bin/hipcc -DMMCV_WITH_CUDA -DHIP_DIFF -I/root/Code/mmcv_issue/mmcv/ops/csrc/common/hip -I/root/miniconda3/envs/open-mmlab/lib/python3.8/site-packages/torch/include -I/root/miniconda3/envs/open-mmlab/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -I/root/miniconda3/envs/open-mmlab/lib/python3.8/site-packages/torch/include/TH -I/root/miniconda3/envs/open-mmlab/lib/python3.8/site-packages/torch/include/THC -I/root/miniconda3/envs/open-mmlab/lib/python3.8/site-packages/torch/include/THH -I/opt/rocm-4.3.0/include -I/opt/rocm-4.3.0/miopen/include -I/root/miniconda3/envs/open-mmlab/include/python3.8 -c -c /root/Code/mmcv_issue/mmcv/ops/csrc/pytorch/hip/focal_loss_hip.hip -o /root/Code/mmcv_issue/build/temp.linux-x86_64-3.8/root/Code/mmcv_issue/mmcv/ops/csrc/pytorch/hip/focal_loss_hip.o -fPIC -D__HIP_PLATFORM_HCC__=1 -DCUDA_HAS_FP16=1 -D__HIP_NO_HALF_OPERATORS__=1 -D__HIP_NO_HALF_CONVERSIONS__=1 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1013"' -DTORCH_EXTENSION_NAME=_ext -D_GLIBCXX_USE_CXX11_ABI=1 --amdgpu-target=gfx803 --amdgpu-target=gfx900 --amdgpu-target=gfx906 --amdgpu-target=gfx908 -fno-gpu-rdc -std=c++14
/root/Code/mmcv_issue/mmcv/ops/csrc/pytorch/hip/focal_loss_hip.hip:4:10: fatal error: 'pytorch_cuda_helper.hpp' file not found
#include "pytorch_cuda_helper.hpp"
^~~~~~~~~~~~~~~~~~~~~~~~~
1 error generated when compiling for gfx803.
include_dirs
is incomplete, when I manually changed the files involved in issue 1, here is the thing, becausesetup.py
did not addmmcv/ops/csrc/common
toinclude.dirs
, so the hipify() file cannot refer to thepytorch_hip_helper.hpp
file, but when I addmmcv/ops/csrc/common
toinclude_dirs
insetup.py
, it will change the file that has been manually changed to hip in Motivation 1 back to cuda, for example, inpytorch_hip_helper.hpp
#include "common_hip_helper.hpp"
becomes#include "common_cuda_helper.hpp"
again. The error information will be like
FAILED: /root/Code/mmcv_issue/build/temp.linux-x86_64-3.8/root/Code/mmcv_issue/mmcv/ops/csrc/pytorch/hip/roi_align_hip.o
/opt/rocm-4.3.0/bin/hipcc -DMMCV_WITH_CUDA -DHIP_DIFF -I/root/Code/mmcv_issue/mmcv/ops/csrc/common/hip -I/root/miniconda3/envs/open-mmlab/lib/python3.8/site-packages/torch/include -I/root/miniconda3/envs/open-mmlab/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -I/root/miniconda3/envs/open-mmlab/lib/python3.8/site-packages/torch/include/TH -I/root/miniconda3/envs/open-mmlab/lib/python3.8/site-packages/torch/include/THC -I/root/miniconda3/envs/open-mmlab/lib/python3.8/site-packages/torch/include/THH -I/opt/rocm-4.3.0/include -I/opt/rocm-4.3.0/miopen/include -I/root/miniconda3/envs/open-mmlab/include/python3.8 -c -c /root/Code/mmcv_issue/mmcv/ops/csrc/pytorch/hip/roi_align_hip.hip -o /root/Code/mmcv_issue/build/temp.linux-x86_64-3.8/root/Code/mmcv_issue/mmcv/ops/csrc/pytorch/hip/roi_align_hip.o -fPIC -D__HIP_PLATFORM_HCC__=1 -DCUDA_HAS_FP16=1 -D__HIP_NO_HALF_OPERATORS__=1 -D__HIP_NO_HALF_CONVERSIONS__=1 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1013"' -DTORCH_EXTENSION_NAME=_ext -D_GLIBCXX_USE_CXX11_ABI=1 --amdgpu-target=gfx803 --amdgpu-target=gfx900 --amdgpu-target=gfx906 --amdgpu-target=gfx908 -fno-gpu-rdc -std=c++14
/root/Code/mmcv_issue/mmcv/ops/csrc/pytorch/hip/roi_align_hip.hip:4:10: fatal error: 'pytorch_hip_helper.hpp' file not found
#include "pytorch_hip_helper.hpp"
op_files
is incomplete. Will, finally I can compiled the _ext module, but after every is done, I try to reference the module, I get this:
import mmcv._ext
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ImportError: dynamic module does not define module export function (PyInit__ext)
So I compared the compilation temp in the CUDA environment and found that the *.cpp file under mmcv/ops/csrc/pytorch/
also compiles in the CUDA environment, because the ROCM part of the code in setup.py
does not add it to the op_files
, therefore the *.cpp
file under mmcv/ops/csrc/pythoch/
won’t be compiled, but it was able to pass compilation successfully, resulting in an error when referencing mmcv._ext
Reproduction
- What command, code, or script did you run?
MMCV_WITH_OPS=1 python3 setup.py develop
- Did you make any modifications on the code? Did you understand what you have modified?
Environment
'sys.platform': 'linux', 'Python': '3.8.11 (default, Aug 3 2021, 15:09:35) [GCC 7.5.0]', 'CUDA available': True, 'GPU 0': 'Ellesmere [Radeon RX 470/480/570/570X/580/580X/590]', 'CUDA_HOME': '/opt/rocm-4.3.0', 'NVCC': '', 'GCC': 'gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0', 'PyTorch': '1.9.0a0+gitdfbd030', 'PyTorch compiling details': 'PyTorch built with:\n - GCC 9.3\n - C++ Version: 201402\n - Intel(R) MKL-DNN v2.1.2 (Git Hash 98be7e8afa711dc9b66c8ff3504129cb82013cdb)\n - OpenMP 201511 (a.k.a. OpenMP 4.5)\n - NNPACK is enabled\n - CPU capability usage: AVX2\n - HIP Runtime 40321.30\n - MIOpen 2.12.0\n - Build settings: BLAS_INFO=open, BUILD_TYPE=Release, CXX_COMPILER=/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOCUPTI -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow, FORCE_FALLBACK_CUDA_MPI=1, LAPACK_INFO=open, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.9.0, USE_CUDA=OFF, USE_CUDNN=OFF, USE_EIGEN_FOR_BLAS=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=OFF, USE_MKLDNN=ON, USE_MPI=ON, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, \n', 'TorchVision': '0.10.0a0+ca1a620', 'OpenCV': '4.5.3', 'MMCV': '1.3.14', 'MMCV Compiler': 'GCC 9.3', 'MMCV CUDA Compiler': 'rocm not vailable'
Error traceback 1.
FAILED: /root/Code/mmcv_issue/build/temp.linux-x86_64-3.8/root/Code/mmcv_issue/mmcv/ops/csrc/pytorch/hip/focal_loss_hip.o
/opt/rocm-4.3.0/bin/hipcc -DMMCV_WITH_CUDA -DHIP_DIFF -I/root/Code/mmcv_issue/mmcv/ops/csrc/common/hip -I/root/miniconda3/envs/open-mmlab/lib/python3.8/site-packages/torch/include -I/root/miniconda3/envs/open-mmlab/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -I/root/miniconda3/envs/open-mmlab/lib/python3.8/site-packages/torch/include/TH -I/root/miniconda3/envs/open-mmlab/lib/python3.8/site-packages/torch/include/THC -I/root/miniconda3/envs/open-mmlab/lib/python3.8/site-packages/torch/include/THH -I/opt/rocm-4.3.0/include -I/opt/rocm-4.3.0/miopen/include -I/root/miniconda3/envs/open-mmlab/include/python3.8 -c -c /root/Code/mmcv_issue/mmcv/ops/csrc/pytorch/hip/focal_loss_hip.hip -o /root/Code/mmcv_issue/build/temp.linux-x86_64-3.8/root/Code/mmcv_issue/mmcv/ops/csrc/pytorch/hip/focal_loss_hip.o -fPIC -D__HIP_PLATFORM_HCC__=1 -DCUDA_HAS_FP16=1 -D__HIP_NO_HALF_OPERATORS__=1 -D__HIP_NO_HALF_CONVERSIONS__=1 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1013"' -DTORCH_EXTENSION_NAME=_ext -D_GLIBCXX_USE_CXX11_ABI=1 --amdgpu-target=gfx803 --amdgpu-target=gfx900 --amdgpu-target=gfx906 --amdgpu-target=gfx908 -fno-gpu-rdc -std=c++14
/root/Code/mmcv_issue/mmcv/ops/csrc/pytorch/hip/focal_loss_hip.hip:4:10: fatal error: 'pytorch_cuda_helper.hpp' file not found
#include "pytorch_cuda_helper.hpp"
^~~~~~~~~~~~~~~~~~~~~~~~~
1 error generated when compiling for gfx803.
FAILED: /root/Code/mmcv_issue/build/temp.linux-x86_64-3.8/root/Code/mmcv_issue/mmcv/ops/csrc/pytorch/hip/roi_align_hip.o
/opt/rocm-4.3.0/bin/hipcc -DMMCV_WITH_CUDA -DHIP_DIFF -I/root/Code/mmcv_issue/mmcv/ops/csrc/common/hip -I/root/miniconda3/envs/open-mmlab/lib/python3.8/site-packages/torch/include -I/root/miniconda3/envs/open-mmlab/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -I/root/miniconda3/envs/open-mmlab/lib/python3.8/site-packages/torch/include/TH -I/root/miniconda3/envs/open-mmlab/lib/python3.8/site-packages/torch/include/THC -I/root/miniconda3/envs/open-mmlab/lib/python3.8/site-packages/torch/include/THH -I/opt/rocm-4.3.0/include -I/opt/rocm-4.3.0/miopen/include -I/root/miniconda3/envs/open-mmlab/include/python3.8 -c -c /root/Code/mmcv_issue/mmcv/ops/csrc/pytorch/hip/roi_align_hip.hip -o /root/Code/mmcv_issue/build/temp.linux-x86_64-3.8/root/Code/mmcv_issue/mmcv/ops/csrc/pytorch/hip/roi_align_hip.o -fPIC -D__HIP_PLATFORM_HCC__=1 -DCUDA_HAS_FP16=1 -D__HIP_NO_HALF_OPERATORS__=1 -D__HIP_NO_HALF_CONVERSIONS__=1 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1013"' -DTORCH_EXTENSION_NAME=_ext -D_GLIBCXX_USE_CXX11_ABI=1 --amdgpu-target=gfx803 --amdgpu-target=gfx900 --amdgpu-target=gfx906 --amdgpu-target=gfx908 -fno-gpu-rdc -std=c++14
/root/Code/mmcv_issue/mmcv/ops/csrc/pytorch/hip/roi_align_hip.hip:4:10: fatal error: 'pytorch_hip_helper.hpp' file not found
#include "pytorch_hip_helper.hpp"
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ImportError: dynamic module does not define module export function (PyInit__ext)
Bug fix If you have already identified the reason, you can provide the information here. If you are willing to create a PR to fix it, please also leave a comment here and that would be much appreciated!
- Manual changed.After first compiled(or you can manual hipify),the code will generated the hipified code in
mmcv/ops/csrc/common/hip
andmmcv/ops/csrc/pytorch/hip
.Then I comment out these codes in thesetup.py
file:
# from torch.utils.hipify import hipify_python
# hipify_python.hipify(
# project_directory=project_dir,
# output_directory=project_dir,
# includes='mmcv/ops/csrc/*',
# show_detailed=True,
# is_pytorch_extension=True,
# )
Otherwise, the code will still overwrite your changes in the next compilation.
After hipify()
, for example, there is #include "ball_query_cuda_kernel.cuh"
in the file ball_query_hip.hip
, which needs to be changed to #include "ball_query_hip_kernel.cuh"
.
2. Copy the required files from the common/
path to the common/hip/
path, except for parrots*.hpp
and *_cuda_*.hpp
.
3. I changed op_files = glob.glob('. /mmcv/ops/csrc/pytorch/hip/*')
to op_files = glob.glob('. /mmcv/ops/csrc/pytorch/*.cpp') + glob.glob('. /mmcv/ops/csrc/pytorch/hip/*')
in setup.py
, which finally compiles the _ext
module successfully.
Issue Analytics
- State:
- Created 2 years ago
- Reactions:1
- Comments:9 (6 by maintainers)
yes, I manually change the
setup.py
code, and can use mmcv-full smoothly now.Thank you very much for your contribution, your
bug fix
is very helpful to us. We will create a PR to resolve the error in the next few days.