Getting error: ImportError: cannot import name 'ProcessGroup' from 'torch.distributed'
See original GitHub issueDiscussed in https://github.com/facebookresearch/detectron2/discussions/4549
<div type='discussions-op-text'>Originally posted by hqm September 18, 2022 I am trying to run detectron2 on an NVIDIA Jetson ARM-based system using a docker container.
I am using a prebuilt pytorch container image from NVIDIA nvcr.io/nvidia/l4t-pytorch:r35.1.0-pth1.12-py3 as a base image This has these versions prebuilt and installed:
l4t-pytorch:r35.1.0-pth1.12-py3
PyTorch v1.12.0
torchvision v0.13.0
torchaudio v0.12.0
I’m installing detectron2 with these commands
RUN git clone https://github.com/facebookresearch/detectron2.git
RUN pip3 install -e detectron2
And when I try to run a detectron2 application, I get this error “cannot import name ‘ProcessGroup’ from ‘torch.distributed’”
I’ve read on the net people say that pytorch is probably not compiled with ‘distributed’ option, so perhaps building it with that option would fix this, but I don’t really want to build it from scratch, given that this container has a nice prebuilt version.
stacktrace below:
File "vision/server.py", line 45, in <module>
from processor import VisionProcessor
File "/home/work/vision/processor.py", line 28, in <module>
from predictors import *
File "/home/work/vision/predictors.py", line 27, in <module>
from detectron2.engine.defaults import DefaultPredictor
File "/home/detectron2/detectron2/engine/__init__.py", line 12, in <module>
from .defaults import *
File "/home/detectron2/detectron2/engine/defaults.py", line 38, in <module>
from detectron2.modeling import build_model
File "/home/detectron2/detectron2/modeling/__init__.py", line 5, in <module>
from .backbone import (
File "/home/detectron2/detectron2/modeling/backbone/__init__.py", line 15, in <module>
from .vit import ViT, SimpleFeaturePyramid, get_vit_lr_decay_rate
File "/home/detectron2/detectron2/modeling/backbone/vit.py", line 10, in <module>
from fairscale.nn.checkpoint import checkpoint_wrapper
File "/usr/local/lib/python3.8/dist-packages/fairscale/__init__.py", line 12, in <module>
from . import nn
File "/usr/local/lib/python3.8/dist-packages/fairscale/nn/__init__.py", line 9, in <module>
from .data_parallel import FullyShardedDataParallel, ShardedDataParallel
File "/usr/local/lib/python3.8/dist-packages/fairscale/nn/data_parallel/__init__.py", line 8, in <module>
from .fully_sharded_data_parallel import (
File "/usr/local/lib/python3.8/dist-packages/fairscale/nn/data_parallel/fully_sharded_data_parallel.py", line 38, in <module>
from torch.distributed import ProcessGroup
ImportError: cannot import name 'ProcessGroup' from 'torch.distributed' (/usr/local/lib/python3.8/dist-packages/torch/distributed/__init__.py)
Anyone know what causes this? Should I use PyTorch v1.11.0 instead of v1.12.0? Or is there a pinned version of detectron2 I should install ?
</div>Issue Analytics
- State:
- Created a year ago
- Comments:5
Top GitHub Comments
It’s a fairscale bug. https://github.com/facebookresearch/fairscale/issues/1057
The fairscale implemented a hotfix. You can install the new fairscale version from source as described here: https://github.com/facebookresearch/fairscale/blob/main/docs/source/installation_instructions.rst. That solves the issue for me.