question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

AssertionError: Default process group is not initialized

See original GitHub issue

Describe the bug python tools/train.py configs/danet/danet_r50-d8_512x1024_40k_cityscapes.py. I get an error when using custom data for model training, AssertionError: Default process group is not initialized. GPU now has two target detection networks running, is this the reason? mmdetection can train multiple networks simultaneously.

Environment info sys.platform: linux Python: 3.7.7 (default, Mar 23 2020, 22:36:06) [GCC 7.3.0] CUDA available: True CUDA_HOME: /usr/local/cuda NVCC: Cuda compilation tools, release 10.1, V10.1.243 GPU 0: Tesla V100-PCIE-32GB GCC: gcc (Ubuntu 9.3.0-10ubuntu2) 9.3.0 PyTorch: 1.5.0 PyTorch compiling details: PyTorch built with:

  • GCC 7.3
  • C++ Version: 201402
  • Intel® Math Kernel Library Version 2020.0.0 Product Build 20191122 for Intel® 64 architecture applications
  • Intel® MKL-DNN v0.21.1 (Git Hash 7d2fd500bc78936d1d648ca713b901012f470dbc)
  • OpenMP 201511 (a.k.a. OpenMP 4.5)
  • NNPACK is enabled
  • CPU capability usage: AVX2
  • CUDA Runtime 10.1
  • NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_37,code=compute_37
  • CuDNN 7.6.3
  • Magma 2.5.2
  • Build settings: BLAS=MKL, BUILD_TYPE=Release, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -fopenmp -DNDEBUG -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DUSE_INTERNAL_THREADPOOL_IMPL -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, USE_CUDA=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_STATIC_DISPATCH=OFF,

TorchVision: 0.6.0a0+82fd1c8 OpenCV: 4.2.0 MMCV: 1.0.2 MMSegmentation: 0.5.0+b72a6d0 MMCV Compiler: GCC 7.3 MMCV CUDA Compiler: 10.1

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:8

github_iconTop GitHub Comments

6reactions
xvjiaruicommented, Jul 11, 2020

Hi @HaoweiGis If you would like to debug with non-distributed training, you need to change SyncBNto BN since distributed training is required by PyTorch SyncBN.

0reactions
xiexinchcommented, Apr 20, 2021

facing same issue, which file under configs folder?

Hi @PriyankaJain-1998 At each config/_base_/models/xxx.py. And you can also run tools/dist_train.sh by setting GPUS=1, like ./tools/dist_train.sh config.py 1

Read more comments on GitHub >

github_iconTop Results From Across the Web

Default process group is not initialized · Issue #131 · mapillary ...
And I have tried run it on both 1 GPU and 2 GPUs but got the same problem. Could u give me some...
Read more >
RuntimeError: Default process group has not been initialized ...
I'm training the model with DistributedDataParallel and made weight file. Then trying to load the pth file with model and eval
Read more >
pytorch分布式报错AssertionError: Default process group is not ...
Default process group has not been initialized, please make sure to call init_process_group. 热门推荐 · weixin_42388228的博客. 11-14 2万+.
Read more >
AssertionError: Default process group is not initialized - 马春杰杰
AssertionError : Default process group is not initialized ... 解决方法:. 把配置文件中的 SyncBN 全部改为 BN 即可。 本文最后更新于2021年10月19日,已 ...
Read more >
pytorch-mmsegmentation train时遇到AssertionError:Default ...
pytorch-mmsegmentation train时遇到AssertionError:Default process group is not initialized_PhDing_H的博客-程序员宅基地_pytorch default process group.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found