"RuntimeError: torch.distributed is not yet initialized but process group is requested" when trying to run API
See original GitHub issue❓ Questions and Help
After following setup steps I ran metaseq-api-local
and got this output:
$ metaseq-api-local
Traceback (most recent call last):
File "/home/jliu/openpretrainedtransformer/metaseq/metaseq/service/constants.py", line 17, in <module>
from metaseq_internal.constants import LOCAL_SSD, MODEL_SHARED_FOLDER
ModuleNotFoundError: No module named 'metaseq_internal'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/jliu/miniconda3/envs/conda_env_opt/bin/metaseq-api-local", line 33, in <module>
sys.exit(load_entry_point('metaseq', 'console_scripts', 'metaseq-api-local')())
File "/home/jliu/miniconda3/envs/conda_env_opt/bin/metaseq-api-local", line 25, in importlib_load_entry_point
return next(matches).load()
File "/home/jliu/miniconda3/envs/conda_env_opt/lib/python3.9/importlib/metadata.py", line 86, in load
module = import_module(match.group('module'))
File "/home/jliu/miniconda3/envs/conda_env_opt/lib/python3.9/importlib/__init__.py", line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "<frozen importlib._bootstrap>", line 1030, in _gcd_import
File "<frozen importlib._bootstrap>", line 1007, in _find_and_load
File "<frozen importlib._bootstrap>", line 986, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 680, in _load_unlocked
File "<frozen importlib._bootstrap_external>", line 850, in exec_module
File "<frozen importlib._bootstrap>", line 228, in _call_with_frames_removed
File "/home/jliu/openpretrainedtransformer/metaseq/metaseq_cli/interactive_hosted.py", line 31, in <module>
from metaseq.service.constants import (
File "/home/jliu/openpretrainedtransformer/metaseq/metaseq/service/constants.py", line 40, in <module>
raise RuntimeError(
RuntimeError: You must set the variables in metaseq.service.constants to launch the API.
Am I missing a step? I tried manually setting LOCAL_SSD, MODEL_SHARED_FOLDER to a new folder I created but then other things failed.
- fairseq Version (e.g., 1.0 or master): followed setup.md
- PyTorch Version (e.g., 1.0) followed setup.md
- OS (e.g., Linux): Ubuntu
- How you installed fairseq (
pip
, source): source - Build command you used (if compiling from source): followed setup.md
- Python version: 3.9.12
- CUDA/cuDNN version: 11.3
- GPU models and configuration: Quadro RTX 5000
- Any other relevant information:
Issue Analytics
- State:
- Created a year ago
- Comments:23 (5 by maintainers)
Top Results From Across the Web
Distributed communication package - torch.distributed - PyTorch
distributed.init_process_group() was run, the following functions can be used. To check whether the process group has already been initialized use torch.
Read more >How to solve dist.init_process_group from hanging (or ...
The following fixes are based on Writing Distributed Applications with PyTorch, Initialization Methods. Issue 1:.
Read more >Distributed communication package - torch.distributed
Another initialization method makes use of a file system that is shared and visible from all machines in a group, along with a...
Read more >Writing Distributed Applications with PyTorch
As opposed to the multiprocessing ( torch.multiprocessing ) package, processes can use different communication backends and are not restricted to being ...
Read more >Modify a PyTorch Training Script - Amazon SageMaker
Use the SageMaker Distributed Data Parallel Library as the Backend of torch. ... When you initialize the PyTorch distributed process group using the ......
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
I still see the issue, any resolution?
I met the same problem of “RuntimeError: torch.distributed is not yet initialized but process group is requested”; I just follow the official setup instruction, but install Apex in the end. After finishing all instruction, I run “metaseq-api-local” ,then come up with this error
I am wondering whether the requirment install order would bring this error?