question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

"RuntimeError: torch.distributed is not yet initialized but process group is requested" when trying to run API

See original GitHub issue

❓ Questions and Help

After following setup steps I ran metaseq-api-local and got this output:

$ metaseq-api-local
Traceback (most recent call last):
  File "/home/jliu/openpretrainedtransformer/metaseq/metaseq/service/constants.py", line 17, in <module>
    from metaseq_internal.constants import LOCAL_SSD, MODEL_SHARED_FOLDER
ModuleNotFoundError: No module named 'metaseq_internal'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/jliu/miniconda3/envs/conda_env_opt/bin/metaseq-api-local", line 33, in <module>
    sys.exit(load_entry_point('metaseq', 'console_scripts', 'metaseq-api-local')())
  File "/home/jliu/miniconda3/envs/conda_env_opt/bin/metaseq-api-local", line 25, in importlib_load_entry_point
    return next(matches).load()
  File "/home/jliu/miniconda3/envs/conda_env_opt/lib/python3.9/importlib/metadata.py", line 86, in load
    module = import_module(match.group('module'))
  File "/home/jliu/miniconda3/envs/conda_env_opt/lib/python3.9/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1030, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1007, in _find_and_load
  File "<frozen importlib._bootstrap>", line 986, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 680, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 850, in exec_module
  File "<frozen importlib._bootstrap>", line 228, in _call_with_frames_removed
  File "/home/jliu/openpretrainedtransformer/metaseq/metaseq_cli/interactive_hosted.py", line 31, in <module>
    from metaseq.service.constants import (
  File "/home/jliu/openpretrainedtransformer/metaseq/metaseq/service/constants.py", line 40, in <module>
    raise RuntimeError(
RuntimeError: You must set the variables in metaseq.service.constants to launch the API.

Am I missing a step? I tried manually setting LOCAL_SSD, MODEL_SHARED_FOLDER to a new folder I created but then other things failed.

  • fairseq Version (e.g., 1.0 or master): followed setup.md
  • PyTorch Version (e.g., 1.0) followed setup.md
  • OS (e.g., Linux): Ubuntu
  • How you installed fairseq (pip, source): source
  • Build command you used (if compiling from source): followed setup.md
  • Python version: 3.9.12
  • CUDA/cuDNN version: 11.3
  • GPU models and configuration: Quadro RTX 5000
  • Any other relevant information:

Issue Analytics

  • State:open
  • Created a year ago
  • Comments:23 (5 by maintainers)

github_iconTop GitHub Comments

6reactions
seelamcommented, May 11, 2022

I still see the issue, any resolution?

1reaction
stevenkwongcommented, May 17, 2022

This is so strange. Can anyone provide the command they are running?

I met the same problem of “RuntimeError: torch.distributed is not yet initialized but process group is requested”; I just follow the official setup instruction, but install Apex in the end. After finishing all instruction, I run “metaseq-api-local” ,then come up with this error

I am wondering whether the requirment install order would bring this error?

Read more comments on GitHub >

github_iconTop Results From Across the Web

Distributed communication package - torch.distributed - PyTorch
distributed.init_process_group() was run, the following functions can be used. To check whether the process group has already been initialized use torch.
Read more >
How to solve dist.init_process_group from hanging (or ...
The following fixes are based on Writing Distributed Applications with PyTorch, Initialization Methods. Issue 1:.
Read more >
Distributed communication package - torch.distributed
Another initialization method makes use of a file system that is shared and visible from all machines in a group, along with a...
Read more >
Writing Distributed Applications with PyTorch
As opposed to the multiprocessing ( torch.multiprocessing ) package, processes can use different communication backends and are not restricted to being ...
Read more >
Modify a PyTorch Training Script - Amazon SageMaker
Use the SageMaker Distributed Data Parallel Library as the Backend of torch. ... When you initialize the PyTorch distributed process group using the ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found