question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

build rocm JAX failed

See original GitHub issue

Please:

  • Check for duplicate issues.
  • Provide a complete example of how to reproduce the bug, wrapped in triple backticks like this:

I use singularity to build an image and follow the README.md using the command like below.

./build_rocm.sh
  • If applicable, include full error messages/tracebacks. these are all the outputs.
Singularity> ./build_rocm.sh
+ ROCM_TF_FORK_REPO=https://github.com/ROCmSoftwarePlatform/tensorflow-upstream
+ ROCM_TF_FORK_BRANCH=develop-upstream
+ ROCM_PATH=/opt/rocm/
+ python3 ../build.py --enable_rocm --rocm_path=/opt/rocm/ --bazel_options=--override_repository=org_tensorflow=/tmp/tensorflow-upstream

     _   _  __  __
    | | / \ \ \/ /
 _  | |/ _ \ \  /
| |_| / ___ \/  \
 \___/_/   \/_/\_\


Bazel binary path: ./bazel-5.1.0-linux-x86_64
Bazel version: 5.1.0
Python binary path: /opt/conda/bin/python3
Python version: 3.9
NumPy version: 1.20.3
MKL-DNN enabled: yes
Target CPU: x86_64
Target CPU features: release
CUDA enabled: no
TPU enabled: no
ROCm enabled: yes
ROCm toolkit path: /opt/rocm/
ROCm amdgpu targets: gfx900,gfx906,gfx908,gfx90a,gfx1030

Building XLA and installing it in the jaxlib source tree...
./bazel-5.1.0-linux-x86_64 run --verbose_failures=true --override_repository=org_tensorflow=/tmp/tensorflow-upstream --config=avx_posix --config=mkl_open_source_only --config=rocm :build_wheel -- --output_path=/home/code/jax/build/rocm/dist --cpu=x86_64
INFO: Options provided by the client:
  Inherited 'common' options: --isatty=0 --terminal_columns=80
INFO: Reading rc options for 'run' from /home/code/jax/.bazelrc:
  Inherited 'common' options: --experimental_repo_remote_exec
INFO: Reading rc options for 'run' from /home/code/jax/.bazelrc:
  Inherited 'build' options: --apple_platform_type=macos --macos_minimum_os=10.9 --announce_rc --define open_source_build=true --spawn_strategy=standalone --enable_platform_specific_config --experimental_cc_shared_library --define=no_aws_support=true --define=no_gcp_support=true --define=no_hdfs_support=true --define=no_kafka_support=true --define=no_ignite_support=true --define=grpc_no_ares=true -c opt --config=short_logs --copt=-DMLIR_PYTHON_PACKAGE_PREFIX=jaxlib.mlir.
INFO: Reading rc options for 'run' from /home/code/jax/.jax_configure.bazelrc:
  Inherited 'build' options: --strategy=Genrule=standalone --repo_env PYTHON_BIN_PATH=/opt/conda/bin/python3 --action_env=PYENV_ROOT --python_path=/opt/conda/bin/python3 --action_env ROCM_PATH=/opt/rocm/ --distinct_host_configuration=false
INFO: Found applicable config definition build:short_logs in file /home/code/jax/.bazelrc: --output_filter=DONT_MATCH_ANYTHING
INFO: Found applicable config definition build:avx_posix in file /home/code/jax/.bazelrc: --copt=-mavx --host_copt=-mavx
INFO: Found applicable config definition build:mkl_open_source_only in file /home/code/jax/.bazelrc: --define=tensorflow_mkldnn_contraction_kernel=1
INFO: Found applicable config definition build:rocm in file /home/code/jax/.bazelrc: --crosstool_top=@local_config_rocm//crosstool:toolchain --define=using_rocm=true --define=using_rocm_hipcc=true --define=xla_python_enable_gpu=true --repo_env TF_NEED_ROCM=1 --action_env TF_ROCM_AMDGPU_TARGETS=gfx900,gfx906,gfx908
INFO: Found applicable config definition build:rocm in file /home/code/jax/.jax_configure.bazelrc: --action_env TF_ROCM_AMDGPU_TARGETS=gfx900,gfx906,gfx908,gfx90a,gfx1030
INFO: Found applicable config definition build:linux in file /home/code/jax/.bazelrc: --config=posix --copt=-Wno-stringop-truncation --copt=-Wno-array-parameter
INFO: Found applicable config definition build:posix in file /home/code/jax/.bazelrc: --copt=-fvisibility=hidden --copt=-Wno-sign-compare --cxxopt=-std=c++14 --host_cxxopt=-std=c++14
Loading:
Loading: 0 packages loaded
Loading: 0 packages loaded
Loading: 0 packages loaded
Loading: 0 packages loaded
Loading: 0 packages loaded
Loading: 0 packages loaded
Loading: 0 packages loaded
Loading: 0 packages loaded
Loading: 0 packages loaded
Loading: 0 packages loaded
Loading: 0 packages loaded
Loading: 0 packages loaded
Loading: 0 packages loaded
Loading: 0 packages loaded
Loading: 0 packages loaded
Loading: 0 packages loaded
Loading: 0 packages loaded
Loading: 0 packages loaded
WARNING: Download from https://storage.googleapis.com/mirror.tensorflow.org/github.com/tensorflow/runtime/archive/2123408fb43a5c4afdf87dafd67117d9c0ff70cd.tar.gz failed: class java.io.FileNotFoundException GET returned 404 Not Found
Loading: 0 packages loaded
Loading: 0 packages loaded
INFO: Repository local_config_rocm instantiated at:
  /home/code/jax/WORKSPACE:31:14: in <toplevel>
  /root/.cache/bazel/_bazel_root/7777a22c05c38dcb5674638712ab6fae/external/org_tensorflow/tensorflow/workspace2.bzl:869:19: in workspace
  /root/.cache/bazel/_bazel_root/7777a22c05c38dcb5674638712ab6fae/external/org_tensorflow/tensorflow/workspace2.bzl:101:19: in _tf_toolchains
Repository rule rocm_configure defined at:
  /root/.cache/bazel/_bazel_root/7777a22c05c38dcb5674638712ab6fae/external/org_tensorflow/third_party/gpus/rocm_configure.bzl:843:33: in <toplevel>
ERROR: An error occurred during the fetch of repository 'local_config_rocm':
   Traceback (most recent call last):
        File "/root/.cache/bazel/_bazel_root/7777a22c05c38dcb5674638712ab6fae/external/org_tensorflow/third_party/gpus/rocm_configure.bzl", line 824, column 38, in _rocm_autoconf_impl
                _create_local_rocm_repository(repository_ctx)
        File "/root/.cache/bazel/_bazel_root/7777a22c05c38dcb5674638712ab6fae/external/org_tensorflow/third_party/gpus/rocm_configure.bzl", line 550, column 35, in _create_local_rocm_repository
                rocm_config = _get_rocm_config(repository_ctx, bash_bin, find_rocm_config_script)
        File "/root/.cache/bazel/_bazel_root/7777a22c05c38dcb5674638712ab6fae/external/org_tensorflow/third_party/gpus/rocm_configure.bzl", line 398, column 30, in _get_rocm_config
                config = find_rocm_config(repository_ctx, find_rocm_config_script)
        File "/root/.cache/bazel/_bazel_root/7777a22c05c38dcb5674638712ab6fae/external/org_tensorflow/third_party/gpus/rocm_configure.bzl", line 376, column 41, in find_rocm_config
                exec_result = _exec_find_rocm_config(repository_ctx, script_path)
        File "/root/.cache/bazel/_bazel_root/7777a22c05c38dcb5674638712ab6fae/external/org_tensorflow/third_party/gpus/rocm_configure.bzl", line 372, column 19, in _exec_find_rocm_config
                return execute(repository_ctx, [python_bin, "-c", decompress_and_execute_cmd])
        File "/root/.cache/bazel/_bazel_root/7777a22c05c38dcb5674638712ab6fae/external/org_tensorflow/third_party/remote_config/common.bzl", line 230, column 13, in execute
                fail(
Error in fail: Repository command failed
ERROR: ROCm version file not found in ['include/rocm-core/rocm_version.h', 'include/rocm_version.h']
ERROR: /home/code/jax/WORKSPACE:31:14: fetching rocm_configure rule //external:local_config_rocm: Traceback (most recent call last):
        File "/root/.cache/bazel/_bazel_root/7777a22c05c38dcb5674638712ab6fae/external/org_tensorflow/third_party/gpus/rocm_configure.bzl", line 824, column 38, in _rocm_autoconf_impl
                _create_local_rocm_repository(repository_ctx)
        File "/root/.cache/bazel/_bazel_root/7777a22c05c38dcb5674638712ab6fae/external/org_tensorflow/third_party/gpus/rocm_configure.bzl", line 550, column 35, in _create_local_rocm_repository
                rocm_config = _get_rocm_config(repository_ctx, bash_bin, find_rocm_config_script)
        File "/root/.cache/bazel/_bazel_root/7777a22c05c38dcb5674638712ab6fae/external/org_tensorflow/third_party/gpus/rocm_configure.bzl", line 398, column 30, in _get_rocm_config
                config = find_rocm_config(repository_ctx, find_rocm_config_script)
        File "/root/.cache/bazel/_bazel_root/7777a22c05c38dcb5674638712ab6fae/external/org_tensorflow/third_party/gpus/rocm_configure.bzl", line 376, column 41, in find_rocm_config
                exec_result = _exec_find_rocm_config(repository_ctx, script_path)
        File "/root/.cache/bazel/_bazel_root/7777a22c05c38dcb5674638712ab6fae/external/org_tensorflow/third_party/gpus/rocm_configure.bzl", line 372, column 19, in _exec_find_rocm_config
                return execute(repository_ctx, [python_bin, "-c", decompress_and_execute_cmd])
        File "/root/.cache/bazel/_bazel_root/7777a22c05c38dcb5674638712ab6fae/external/org_tensorflow/third_party/remote_config/common.bzl", line 230, column 13, in execute
                fail(
Error in fail: Repository command failed
ERROR: ROCm version file not found in ['include/rocm-core/rocm_version.h', 'include/rocm_version.h']
INFO: Repository bazel_skylib instantiated at:
  /home/code/jax/WORKSPACE:28:14: in <toplevel>
  /root/.cache/bazel/_bazel_root/7777a22c05c38dcb5674638712ab6fae/external/org_tensorflow/tensorflow/workspace3.bzl:21:17: in workspace
Repository rule http_archive defined at:
  /root/.cache/bazel/_bazel_root/7777a22c05c38dcb5674638712ab6fae/external/bazel_tools/tools/build_defs/repo/http.bzl:353:31: in <toplevel>
ERROR: Skipping ':build_wheel': no such package '@local_config_rocm//rocm': Repository command failed
ERROR: ROCm version file not found in ['include/rocm-core/rocm_version.h', 'include/rocm_version.h']
WARNING: Target pattern parsing failed.
ERROR: no such package '@local_config_rocm//rocm': Repository command failed
ERROR: ROCm version file not found in ['include/rocm-core/rocm_version.h', 'include/rocm_version.h']
INFO: Elapsed time: 77.870s
INFO: 0 processes.
FAILED: Build did NOT complete successfully (0 packages loaded)
ERROR: Build failed. Not running target
FAILED: Build did NOT complete successfully (0 packages loaded)
b''
Traceback (most recent call last):
  File "/home/code/jax/build/rocm/../build.py", line 527, in <module>
    main()
  File "/home/code/jax/build/rocm/../build.py", line 522, in main
    shell(command)
  File "/home/code/jax/build/rocm/../build.py", line 53, in shell
    output = subprocess.check_output(cmd)
  File "/opt/conda/lib/python3.9/subprocess.py", line 424, in check_output
    return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
  File "/opt/conda/lib/python3.9/subprocess.py", line 528, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['./bazel-5.1.0-linux-x86_64', 'run', '--verbose_failures=true', '--override_repository=org_tensorflow=/tmp/tensorflow-upstream', '--config=avx_posix', '--config=mkl_open_source_only', '--config=rocm', ':build_wheel', '--', '--output_path=/home/code/jax/build/rocm/dist', '--cpu=x86_64']' returned non-zero exit status 1.

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:5 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
brettkooncecommented, Jun 9, 2022

I installed rocm 5.1.3, and have a similar failure during the build process as well:

Downloading bazel from: https://github.com/bazelbuild/bazel/releases/download/5.1.1/bazel-5.1.1
-linux-x86_64                                                                                  
                                                                                               
Bazel binary path: ./bazel-5.1.1-linux-x86_64                                                  
Bazel version: 5.1.1                                                                           
Python binary path: /usr/bin/python3                                                           
Python version: 3.9                                                                            
NumPy version: 1.19.5                                                                          
MKL-DNN enabled: yes                           
Target CPU: x86_64     
Target CPU features: release                   
CUDA enabled: no       
TPU enabled: no        
ROCm enabled: yes      
ROCm toolkit path: /opt/rocm-5.1.0             
ROCm amdgpu targets: gfx900,gfx906,gfx908,gfx90a,gfx1030  

Building XLA and installing it in the jaxlib source tree...
./bazel-5.1.1-linux-x86_64 run --verbose_failures=true --override_repository=org_tensorflow=/tmp/tensorflow-upstream --config=avx_posix --config=mkl_open_source_only --config=rocm :build_wheel -- --output_path=/workspace/dist --cpu=x86_64
b''
Traceback (most recent call last):
  File "/workspace/./build/build.py", line 528, in <module>
    main()
  File "/workspace/./build/build.py", line 523, in main
    shell(command)
  File "/workspace/./build/build.py", line 53, in shell
    output = subprocess.check_output(cmd)
  File "/usr/lib/python3.9/subprocess.py", line 424, in check_output
    return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
  File "/usr/lib/python3.9/subprocess.py", line 528, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['./bazel-5.1.1-linux-x86_64', 'run', '--verbose_failures=true', '--override_repository=org_tensorflow=/tmp/tensorflow-upstream', '--config=avx_posix', '--config=mkl_open_source_only', '--config=rocm', ':build_wheel', '--', '--output_path=/workspace/dist', '--cpu=x86_64']' returned non-zero exit status 2.

0reactions
hawkinspcommented, Aug 15, 2022

I’m guessing this issue is fixed!

Read more comments on GitHub >

github_iconTop Results From Across the Web

rocm/jax - Docker Image
JAX -ROCm Docker Images: This repository contains the latest support for JAX in ROCm framework. Here's the recommended command to launch JAX-ROCm docker ......
Read more >
Building from source - JAX documentation
Building JAX involves two steps: Building or installing jaxlib , the C++ support library for jax . Installing the jax Python package.
Read more >
python-jax - AUR (en) - Arch Linux
Git Clone URL: https://aur.archlinux.org/python-jax.git (read-only, click to copy) ... FAILED: Build did NOT complete successfully ERROR: Build failed.
Read more >
Build from source - TensorFlow
Build a TensorFlow pip package from source and install it on Ubuntu Linux and macOS. While the instructions might work for other systems,...
Read more >
Error when tryind to build JAX in Windows - Stack Overflow
But I am getting an error when trying to build JAX. ... no ROCm enabled: no Building XLA and installing it in the...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found