HOROVOD_WITH_MXNET=1 to debug the build error. any bady can help me.
See original GitHub issuerun.sh
but I get a error like this, I have tried mxnet 1.6.0, mxnet-cu101, but it is not work .the horovodrun --check
like this .
Horovod v0.19.2:
Available Frameworks:
[X] TensorFlow
[X] PyTorch
[ ] MXNet
Available Controllers:
[X] MPI
[X] Gloo
Available Tensor Operations:
[X] NCCL
[ ] DDL
[ ] CCL
[X] MPI
[X] Gloo
- my cuda version is 10.02 . so , Is my cuda version is wrong ???
when I run.sh
, the problem like this .
[ps-SYS-4028GR-TR:13182] Warning: could not find environment variable "LD_LIBRARY_PATH"
Traceback (most recent call last):
File "train_memory.py", line 14, in <module>
import horovod.mxnet as hvd
File "/home/liuyang/anaconda3/envs/mxnet_partial/lib/python3.6/site-packages/horovod/mxnet/__init__.py", line 23, in <module>
__file__, 'mpi_lib')
File "/home/liuyang/anaconda3/envs/mxnet_partial/lib/python3.6/site-packages/horovod/common/util.py", line 56, in check_extension
'Horovod with %s=1 to debug the build error.' % (ext_name, ext_env_var))
ImportError: Extension horovod.mxnet has not been built. If this is not expected, reinstall Horovod with HOROVOD_WITH_MXNET=1 to debug the build error.
Traceback (most recent call last):
File "train_memory.py", line 14, in <module>
import horovod.mxnet as hvd
File "/home/liuyang/anaconda3/envs/mxnet_partial/lib/python3.6/site-packages/horovod/mxnet/__init__.py", line 23, in <module>
__file__, 'mpi_lib')
File "/home/liuyang/anaconda3/envs/mxnet_partial/lib/python3.6/site-packages/horovod/common/util.py", line 56, in check_extension
'Horovod with %s=1 to debug the build error.' % (ext_name, ext_env_var))
ImportError: Extension horovod.mxnet has not been built. If this is not expected, reinstall Horovod with HOROVOD_WITH_MXNET=1 to debug the build error.
Traceback (most recent call last):
File "train_memory.py", line 14, in <module>
import horovod.mxnet as hvd
File "/home/liuyang/anaconda3/envs/mxnet_partial/lib/python3.6/site-packages/horovod/mxnet/__init__.py", line 23, in <module>
__file__, 'mpi_lib')
File "/home/liuyang/anaconda3/envs/mxnet_partial/lib/python3.6/site-packages/horovod/common/util.py", line 56, in check_extension
'Horovod with %s=1 to debug the build error.' % (ext_name, ext_env_var))
ImportError: Extension horovod.mxnet has not been built. If this is not expected, reinstall Horovod with HOROVOD_WITH_MXNET=1 to debug the build error.
Traceback (most recent call last):
File "train_memory.py", line 14, in <module>
import horovod.mxnet as hvd
File "/home/liuyang/anaconda3/envs/mxnet_partial/lib/python3.6/site-packages/horovod/mxnet/__init__.py", line 23, in <module>
__file__, 'mpi_lib')
File "/home/liuyang/anaconda3/envs/mxnet_partial/lib/python3.6/site-packages/horovod/common/util.py", line 56, in check_extension
'Horovod with %s=1 to debug the build error.' % (ext_name, ext_env_var))
ImportError: Extension horovod.mxnet has not been built. If this is not expected, reinstall Horovod with HOROVOD_WITH_MXNET=1 to debug the build error.
--------------------------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
Traceback (most recent call last):
File "train_memory.py", line 14, in <module>
import horovod.mxnet as hvd
File "/home/liuyang/anaconda3/envs/mxnet_partial/lib/python3.6/site-packages/horovod/mxnet/__init__.py", line 23, in <module>
__file__, 'mpi_lib')
File "/home/liuyang/anaconda3/envs/mxnet_partial/lib/python3.6/site-packages/horovod/common/util.py", line 56, in check_extension
'Horovod with %s=1 to debug the build error.' % (ext_name, ext_env_var))
ImportError: Extension horovod.mxnet has not been built. If this is not expected, reinstall Horovod with HOROVOD_WITH_MXNET=1 to debug the build error.
Traceback (most recent call last):
File "train_memory.py", line 14, in <module>
import horovod.mxnet as hvd
File "/home/liuyang/anaconda3/envs/mxnet_partial/lib/python3.6/site-packages/horovod/mxnet/__init__.py", line 23, in <module>
__file__, 'mpi_lib')
File "/home/liuyang/anaconda3/envs/mxnet_partial/lib/python3.6/site-packages/horovod/common/util.py", line 56, in check_extension
'Horovod with %s=1 to debug the build error.' % (ext_name, ext_env_var))
ImportError: Extension horovod.mxnet has not been built. If this is not expected, reinstall Horovod with HOROVOD_WITH_MXNET=1 to debug the build error.
--------------------------------------------------------------------------
mpirun detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:
Process name: [[35340,1],1]
Exit code: 1
Issue Analytics
- State:
- Created 3 years ago
- Comments:7
Top Results From Across the Web
Troubleshooting - Horovod documentation - Read the Docs
If you see the error message below, it means that TensorFlow cannot be loaded. If you're installing Horovod into a container on a...
Read more >Apache MXNet - Quick Guide - Tutorialspoint
Below are the steps with the help of which, we can setup MXNet with CUDA. Step 1− First install Microsoft Visual Studio 2017...
Read more >Horovod-MXNet Integration - Apache Software Foundation
We propose to add Horovod support to MXNet. This will help our users achieve goal of linear scalability to 256 GPUs and beyond....
Read more >Machine Learning | GitHub Release Tracker
Now you can enable large tensor support by changing the following build flag to ... to MXNet engine in callback (#13922); Restore save/load...
Read more >build failed: error: can't find python, please install ... - You.com
C:\Users\Michael Nguyen>pip install dlib Collecting dlib Using cached dlib-19.8.1.tar.gz Building wheels for collected packages: dlib Running setup.py ...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
maybe you have a mxnet of cpu version, we use the specifed version of mxnet is [mxnet-cu101 1.6.0.post0]. you can check this.
Thank you so much!, Have a good day!