ImportError:Extension horovod.torch has not been built
See original GitHub issuefrom horovod.tensorflow import allreduce_async_, synchronize
The program runs at the line above break off. The error info as below:
Traceback (most recent call last):
File "/GPUFS/nudt_chkwu_2/kfhu/horovod-0.19.2/horovod/torch/__init__.py", line 32, in <module>
__file__, 'mpi_lib_v2')
File "/GPUFS/nudt_chkwu_2/kfhu/horovod-0.19.2/horovod/common/util.py", line 56, in check_extension
'Horovod with %s=1 to debug the build error.' % (ext_name, ext_env_var))
ImportError: Extension horovod.torch has not been built. If this is not expected, reinstall Horovod with HOROVOD_WITH_PYTORCH=1 to debug the build error.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/GPUFS/nudt_chkwu_2/kfhu/horovod-0.19.2/horovod/torch/__init__.py", line 35, in <module>
__file__, 'mpi_lib', '_mpi_lib')
File "/GPUFS/nudt_chkwu_2/kfhu/horovod-0.19.2/horovod/common/util.py", line 56, in check_extension
'Horovod with %s=1 to debug the build error.' % (ext_name, ext_env_var))
ImportError: Extension horovod.torch has not been built. If this is not expected, reinstall Horovod with HOROVOD_WITH_PYTORCH=1 to debug the build error.
Can you give me a resolution? Appreciate for your help!
Issue Analytics
- State:
- Created 3 years ago
- Comments:6 (3 by maintainers)
Top Results From Across the Web
ImportError: Extension horovod.torch has not been built · Issue ...
I have run the following command to test horovod pytorch frame, the error occurs: jovyan@560c5fd869da:~$ mpirun -np 1 -bind-to none -map-by ...
Read more >ImportError: Extension horovod.tensorflow has not been built
Rebuilding Horovod using the GitHub instructions solved my issue. I cloned the repo in the folder I had the program I was trying...
Read more >Troubleshooting - Horovod documentation - Read the Docs
If you see the error message below, it means that TensorFlow is not installed. Please install TensorFlow before installing Horovod. error: import tensorflow ......
Read more >Distributed training - Azure Databricks - Microsoft Learn
Problem: Importing horovod.{torch|tensorflow} raises ImportError: Extension horovod.{torch|tensorflow} has not been built.
Read more >Distributed training | Databricks on AWS
The error indicates that Horovod was installed before a required library (PyTorch or TensorFlow). Since Horovod is compiled during installation, ...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
There is no problem when I used
compression = hvf.Compression.fp16 if args.fp16 allreduce else hvd.Compression.none
, which used in the example program supported by Horovod, instead ofgrc = Allgather(TopKCompressor(0.3), ResidualMemory(), hvd.size())
. That bug you mentioned above already be fixed before running the training script. It is the truth that I haven’t apply the patch, as I don’t know how to make it. Could you tell me the details about this patch?Best regards
No need. Just modify the related pytorch files.
–
This message and its contents, including attachments are intended solely for the original recipient. If you are not the intended recipient or have received this message in error, please notify me immediately and delete this message from your computer system. Any unauthorized use or distribution is prohibited. Please consider the environment before printing this email.