question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

ImportError:Extension horovod.torch has not been built

See original GitHub issue
from horovod.tensorflow import allreduce_async_, synchronize

The program runs at the line above break off. The error info as below:

Traceback (most recent call last):
  File "/GPUFS/nudt_chkwu_2/kfhu/horovod-0.19.2/horovod/torch/__init__.py", line 32, in <module>
    __file__, 'mpi_lib_v2')
  File "/GPUFS/nudt_chkwu_2/kfhu/horovod-0.19.2/horovod/common/util.py", line 56, in check_extension
    'Horovod with %s=1 to debug the build error.' % (ext_name, ext_env_var))
ImportError: Extension horovod.torch has not been built.  If this is not expected, reinstall Horovod with HOROVOD_WITH_PYTORCH=1 to debug the build error.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/GPUFS/nudt_chkwu_2/kfhu/horovod-0.19.2/horovod/torch/__init__.py", line 35, in <module>
    __file__, 'mpi_lib', '_mpi_lib')
  File "/GPUFS/nudt_chkwu_2/kfhu/horovod-0.19.2/horovod/common/util.py", line 56, in check_extension
    'Horovod with %s=1 to debug the build error.' % (ext_name, ext_env_var))
ImportError: Extension horovod.torch has not been built.  If this is not expected, reinstall Horovod with HOROVOD_WITH_PYTORCH=1 to debug the build error.

Can you give me a resolution? Appreciate for your help!

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:6 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
KevvinHoocommented, May 28, 2020

There is no problem when I used compression = hvf.Compression.fp16 if args.fp16 allreduce else hvd.Compression.none, which used in the example program supported by Horovod, instead of grc = Allgather(TopKCompressor(0.3), ResidualMemory(), hvd.size()). That bug you mentioned above already be fixed before running the training script. It is the truth that I haven’t apply the patch, as I don’t know how to make it. Could you tell me the details about this patch?

Best regards

0reactions
hangxu0304commented, May 28, 2020

No need. Just modify the related pytorch files.

在 2020年5月28日,16:12,Tonyhukaifan notifications@github.com 写道:

Never mind - I have read the file named horovd 0.18.2-patch. Maybe I know how to apply this patch. The - indicate to delete and the + is to add, right? And is it needed to make a change to the module file about the TensorFlow when I used the PyTorch framework?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

This message and its contents, including attachments are intended solely for the original recipient. If you are not the intended recipient or have received this message in error, please notify me immediately and delete this message from your computer system. Any unauthorized use or distribution is prohibited. Please consider the environment before printing this email.

Read more comments on GitHub >

github_iconTop Results From Across the Web

ImportError: Extension horovod.torch has not been built · Issue ...
I have run the following command to test horovod pytorch frame, the error occurs: jovyan@560c5fd869da:~$ mpirun -np 1 -bind-to none -map-by ...
Read more >
ImportError: Extension horovod.tensorflow has not been built
Rebuilding Horovod using the GitHub instructions solved my issue. I cloned the repo in the folder I had the program I was trying...
Read more >
Troubleshooting - Horovod documentation - Read the Docs
If you see the error message below, it means that TensorFlow is not installed. Please install TensorFlow before installing Horovod. error: import tensorflow ......
Read more >
Distributed training - Azure Databricks - Microsoft Learn
Problem: Importing horovod.{torch|tensorflow} raises ImportError: Extension horovod.{torch|tensorflow} has not been built.
Read more >
Distributed training | Databricks on AWS
The error indicates that Horovod was installed before a required library (PyTorch or TensorFlow). Since Horovod is compiled during installation, ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found