question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Successfully compiled and installed MMCV-FULL on gfx803, ROCM4.3.1, pytorch1.9.1 platform

See original GitHub issue

Describe the Issue A clear and concise description of what the bug is, including what results are expected and what the real results you got. When I was compiling mmcv-full, I ran into the following problem,

  1. hipify() is incomplete.The new .hip files generated under the mmcv/ops/csrc/common/hip and mmcv/ops/csrc/pytorch/hip paths require manual changes to the required pytorch_xxx_helper.hpp, and the mmcv/ops/csrc/pytorch/hip paths file also needs to change the name of the .cuh, and something else. For example, after hipify() there is #include "ball_query_cuda_kernel.cuh" in the file ball_query_hip.hip, which should be #include "ball_query_hip_kernel.cuh" .The error information will be like
FAILED: /root/Code/mmcv_issue/build/temp.linux-x86_64-3.8/root/Code/mmcv_issue/mmcv/ops/csrc/pytorch/hip/focal_loss_hip.o 
/opt/rocm-4.3.0/bin/hipcc  -DMMCV_WITH_CUDA -DHIP_DIFF -I/root/Code/mmcv_issue/mmcv/ops/csrc/common/hip -I/root/miniconda3/envs/open-mmlab/lib/python3.8/site-packages/torch/include -I/root/miniconda3/envs/open-mmlab/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -I/root/miniconda3/envs/open-mmlab/lib/python3.8/site-packages/torch/include/TH -I/root/miniconda3/envs/open-mmlab/lib/python3.8/site-packages/torch/include/THC -I/root/miniconda3/envs/open-mmlab/lib/python3.8/site-packages/torch/include/THH -I/opt/rocm-4.3.0/include -I/opt/rocm-4.3.0/miopen/include -I/root/miniconda3/envs/open-mmlab/include/python3.8 -c -c /root/Code/mmcv_issue/mmcv/ops/csrc/pytorch/hip/focal_loss_hip.hip -o /root/Code/mmcv_issue/build/temp.linux-x86_64-3.8/root/Code/mmcv_issue/mmcv/ops/csrc/pytorch/hip/focal_loss_hip.o -fPIC -D__HIP_PLATFORM_HCC__=1 -DCUDA_HAS_FP16=1 -D__HIP_NO_HALF_OPERATORS__=1 -D__HIP_NO_HALF_CONVERSIONS__=1 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1013"' -DTORCH_EXTENSION_NAME=_ext -D_GLIBCXX_USE_CXX11_ABI=1 --amdgpu-target=gfx803 --amdgpu-target=gfx900 --amdgpu-target=gfx906 --amdgpu-target=gfx908 -fno-gpu-rdc -std=c++14
/root/Code/mmcv_issue/mmcv/ops/csrc/pytorch/hip/focal_loss_hip.hip:4:10: fatal error: 'pytorch_cuda_helper.hpp' file not found
#include "pytorch_cuda_helper.hpp"
         ^~~~~~~~~~~~~~~~~~~~~~~~~
1 error generated when compiling for gfx803.
  1. include_dirs is incomplete, when I manually changed the files involved in issue 1, here is the thing, because setup.py did not add mmcv/ops/csrc/common to include.dirs, so the hipify() file cannot refer to the pytorch_hip_helper.hpp file, but when I add mmcv/ops/csrc/common to include_dirs in setup.py, it will change the file that has been manually changed to hip in Motivation 1 back to cuda, for example, in pytorch_hip_helper.hpp #include "common_hip_helper.hpp" becomes #include "common_cuda_helper.hpp" again. The error information will be like
FAILED: /root/Code/mmcv_issue/build/temp.linux-x86_64-3.8/root/Code/mmcv_issue/mmcv/ops/csrc/pytorch/hip/roi_align_hip.o 
/opt/rocm-4.3.0/bin/hipcc  -DMMCV_WITH_CUDA -DHIP_DIFF -I/root/Code/mmcv_issue/mmcv/ops/csrc/common/hip -I/root/miniconda3/envs/open-mmlab/lib/python3.8/site-packages/torch/include -I/root/miniconda3/envs/open-mmlab/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -I/root/miniconda3/envs/open-mmlab/lib/python3.8/site-packages/torch/include/TH -I/root/miniconda3/envs/open-mmlab/lib/python3.8/site-packages/torch/include/THC -I/root/miniconda3/envs/open-mmlab/lib/python3.8/site-packages/torch/include/THH -I/opt/rocm-4.3.0/include -I/opt/rocm-4.3.0/miopen/include -I/root/miniconda3/envs/open-mmlab/include/python3.8 -c -c /root/Code/mmcv_issue/mmcv/ops/csrc/pytorch/hip/roi_align_hip.hip -o /root/Code/mmcv_issue/build/temp.linux-x86_64-3.8/root/Code/mmcv_issue/mmcv/ops/csrc/pytorch/hip/roi_align_hip.o -fPIC -D__HIP_PLATFORM_HCC__=1 -DCUDA_HAS_FP16=1 -D__HIP_NO_HALF_OPERATORS__=1 -D__HIP_NO_HALF_CONVERSIONS__=1 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1013"' -DTORCH_EXTENSION_NAME=_ext -D_GLIBCXX_USE_CXX11_ABI=1 --amdgpu-target=gfx803 --amdgpu-target=gfx900 --amdgpu-target=gfx906 --amdgpu-target=gfx908 -fno-gpu-rdc -std=c++14
/root/Code/mmcv_issue/mmcv/ops/csrc/pytorch/hip/roi_align_hip.hip:4:10: fatal error: 'pytorch_hip_helper.hpp' file not found
#include "pytorch_hip_helper.hpp"
  1. op_files is incomplete. Will, finally I can compiled the _ext module, but after every is done, I try to reference the module, I get this:
import mmcv._ext
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ImportError: dynamic module does not define module export function (PyInit__ext) 

So I compared the compilation temp in the CUDA environment and found that the *.cpp file under mmcv/ops/csrc/pytorch/ also compiles in the CUDA environment, because the ROCM part of the code in setup.py does not add it to the op_files, therefore the *.cpp file under mmcv/ops/csrc/pythoch/ won’t be compiled, but it was able to pass compilation successfully, resulting in an error when referencing mmcv._ext

Reproduction

  1. What command, code, or script did you run?
MMCV_WITH_OPS=1 python3 setup.py develop
  1. Did you make any modifications on the code? Did you understand what you have modified?

Environment

'sys.platform': 'linux', 'Python': '3.8.11 (default, Aug  3 2021, 15:09:35) [GCC 7.5.0]', 'CUDA available': True, 'GPU 0': 'Ellesmere [Radeon RX 470/480/570/570X/580/580X/590]', 'CUDA_HOME': '/opt/rocm-4.3.0', 'NVCC': '', 'GCC': 'gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0', 'PyTorch': '1.9.0a0+gitdfbd030', 'PyTorch compiling details': 'PyTorch built with:\n  - GCC 9.3\n  - C++ Version: 201402\n  - Intel(R) MKL-DNN v2.1.2 (Git Hash 98be7e8afa711dc9b66c8ff3504129cb82013cdb)\n  - OpenMP 201511 (a.k.a. OpenMP 4.5)\n  - NNPACK is enabled\n  - CPU capability usage: AVX2\n  - HIP Runtime 40321.30\n  - MIOpen 2.12.0\n  - Build settings: BLAS_INFO=open, BUILD_TYPE=Release, CXX_COMPILER=/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOCUPTI -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow, FORCE_FALLBACK_CUDA_MPI=1, LAPACK_INFO=open, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.9.0, USE_CUDA=OFF, USE_CUDNN=OFF, USE_EIGEN_FOR_BLAS=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=OFF, USE_MKLDNN=ON, USE_MPI=ON, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, \n', 'TorchVision': '0.10.0a0+ca1a620', 'OpenCV': '4.5.3', 'MMCV': '1.3.14', 'MMCV Compiler': 'GCC 9.3', 'MMCV CUDA Compiler': 'rocm not vailable'

Error traceback 1.

FAILED: /root/Code/mmcv_issue/build/temp.linux-x86_64-3.8/root/Code/mmcv_issue/mmcv/ops/csrc/pytorch/hip/focal_loss_hip.o 
/opt/rocm-4.3.0/bin/hipcc  -DMMCV_WITH_CUDA -DHIP_DIFF -I/root/Code/mmcv_issue/mmcv/ops/csrc/common/hip -I/root/miniconda3/envs/open-mmlab/lib/python3.8/site-packages/torch/include -I/root/miniconda3/envs/open-mmlab/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -I/root/miniconda3/envs/open-mmlab/lib/python3.8/site-packages/torch/include/TH -I/root/miniconda3/envs/open-mmlab/lib/python3.8/site-packages/torch/include/THC -I/root/miniconda3/envs/open-mmlab/lib/python3.8/site-packages/torch/include/THH -I/opt/rocm-4.3.0/include -I/opt/rocm-4.3.0/miopen/include -I/root/miniconda3/envs/open-mmlab/include/python3.8 -c -c /root/Code/mmcv_issue/mmcv/ops/csrc/pytorch/hip/focal_loss_hip.hip -o /root/Code/mmcv_issue/build/temp.linux-x86_64-3.8/root/Code/mmcv_issue/mmcv/ops/csrc/pytorch/hip/focal_loss_hip.o -fPIC -D__HIP_PLATFORM_HCC__=1 -DCUDA_HAS_FP16=1 -D__HIP_NO_HALF_OPERATORS__=1 -D__HIP_NO_HALF_CONVERSIONS__=1 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1013"' -DTORCH_EXTENSION_NAME=_ext -D_GLIBCXX_USE_CXX11_ABI=1 --amdgpu-target=gfx803 --amdgpu-target=gfx900 --amdgpu-target=gfx906 --amdgpu-target=gfx908 -fno-gpu-rdc -std=c++14
/root/Code/mmcv_issue/mmcv/ops/csrc/pytorch/hip/focal_loss_hip.hip:4:10: fatal error: 'pytorch_cuda_helper.hpp' file not found
#include "pytorch_cuda_helper.hpp"
         ^~~~~~~~~~~~~~~~~~~~~~~~~
1 error generated when compiling for gfx803.
FAILED: /root/Code/mmcv_issue/build/temp.linux-x86_64-3.8/root/Code/mmcv_issue/mmcv/ops/csrc/pytorch/hip/roi_align_hip.o 
/opt/rocm-4.3.0/bin/hipcc  -DMMCV_WITH_CUDA -DHIP_DIFF -I/root/Code/mmcv_issue/mmcv/ops/csrc/common/hip -I/root/miniconda3/envs/open-mmlab/lib/python3.8/site-packages/torch/include -I/root/miniconda3/envs/open-mmlab/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -I/root/miniconda3/envs/open-mmlab/lib/python3.8/site-packages/torch/include/TH -I/root/miniconda3/envs/open-mmlab/lib/python3.8/site-packages/torch/include/THC -I/root/miniconda3/envs/open-mmlab/lib/python3.8/site-packages/torch/include/THH -I/opt/rocm-4.3.0/include -I/opt/rocm-4.3.0/miopen/include -I/root/miniconda3/envs/open-mmlab/include/python3.8 -c -c /root/Code/mmcv_issue/mmcv/ops/csrc/pytorch/hip/roi_align_hip.hip -o /root/Code/mmcv_issue/build/temp.linux-x86_64-3.8/root/Code/mmcv_issue/mmcv/ops/csrc/pytorch/hip/roi_align_hip.o -fPIC -D__HIP_PLATFORM_HCC__=1 -DCUDA_HAS_FP16=1 -D__HIP_NO_HALF_OPERATORS__=1 -D__HIP_NO_HALF_CONVERSIONS__=1 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1013"' -DTORCH_EXTENSION_NAME=_ext -D_GLIBCXX_USE_CXX11_ABI=1 --amdgpu-target=gfx803 --amdgpu-target=gfx900 --amdgpu-target=gfx906 --amdgpu-target=gfx908 -fno-gpu-rdc -std=c++14
/root/Code/mmcv_issue/mmcv/ops/csrc/pytorch/hip/roi_align_hip.hip:4:10: fatal error: 'pytorch_hip_helper.hpp' file not found
#include "pytorch_hip_helper.hpp"
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ImportError: dynamic module does not define module export function (PyInit__ext) 

Bug fix If you have already identified the reason, you can provide the information here. If you are willing to create a PR to fix it, please also leave a comment here and that would be much appreciated!

  1. Manual changed.After first compiled(or you can manual hipify),the code will generated the hipified code in mmcv/ops/csrc/common/hip and mmcv/ops/csrc/pytorch/hip.Then I comment out these codes in the setup.py file:
# from torch.utils.hipify import hipify_python

            # hipify_python.hipify(
            #     project_directory=project_dir,
            #     output_directory=project_dir,
            #     includes='mmcv/ops/csrc/*',
            #     show_detailed=True,
            #     is_pytorch_extension=True,
            # )

Otherwise, the code will still overwrite your changes in the next compilation. After hipify(), for example, there is #include "ball_query_cuda_kernel.cuh" in the file ball_query_hip.hip, which needs to be changed to #include "ball_query_hip_kernel.cuh". 2. Copy the required files from the common/ path to the common/hip/ path, except for parrots*.hpp and *_cuda_*.hpp. 3. I changed op_files = glob.glob('. /mmcv/ops/csrc/pytorch/hip/*') to op_files = glob.glob('. /mmcv/ops/csrc/pytorch/*.cpp') + glob.glob('. /mmcv/ops/csrc/pytorch/hip/*') in setup.py, which finally compiles the _ext module successfully.

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Reactions:1
  • Comments:9 (6 by maintainers)

github_iconTop GitHub Comments

1reaction
gakkispycommented, Dec 20, 2021

Hi @gakkispy , did you resolve the error?

yes, I manually change the setup.py code, and can use mmcv-full smoothly now.

0reactions
zhouzaidacommented, Dec 19, 2021

Thank you very much for your contribution, your bug fix is very helpful to us. We will create a PR to resolve the error in the next few days.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Installation — mmcv 1.7.0 documentation - Read the Docs
Before installing mmcv-full, make sure that PyTorch has been successfully installed following the PyTorch official installation guide.
Read more >
Future request: change position of different wallets - Input-Output-Hk ...
How do I make less polys? 0, 2022-07-01 ; Successfully compiled and installed MMCV-FULL on gfx803, ROCM4.3.1, pytorch1.9.1 platform, 9, 2021-10-11 ; German...
Read more >
mmcv - bytemeta
Install mmcv-full failed based on pytorch image with error msg `no kernel image is ... Successfully compiled and installed MMCV-FULL on gfx803, ROCM4.3.1, ......
Read more >
docs/en/install.md · tomofi/MMOCR at main - Hugging Face
1, you can install mmcv-full compiled with PyTorch 1.x.0 and it usually works well. # We can ignore the micro version of PyTorch...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found