question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

ERROR train.py: Default process group is not initialized

See original GitHub issue

I get this error when training on a single GPU, when calling the function distributed() to disable tqdm.

To avoid this I have simple wrapped distributed like:

def distributed():
    try:
        return dist.is_available() and dist.is_initialized()
    except:
        return False

Issue Analytics

  • State:open
  • Created 4 years ago
  • Comments:5

github_iconTop GitHub Comments

1reaction
loretoparisicommented, Nov 25, 2019

@jongwook yes I can make it working with the changes I did so far. I will further investigate NCLL by the way. Thanks.

1reaction
jongwookcommented, Nov 20, 2019

Can you paste the output of the following bash script - to check your system information?

curl https://raw.githubusercontent.com/pytorch/pytorch/master/torch/utils/collect_env.py | python -

I suspect two possibilities:

  • Your NCCL installation is incomplete: try this to (re)install it
  • You’re on Windows: we don’t have plan for Windows support it at this point.
Read more comments on GitHub >

github_iconTop Results From Across the Web

Default process group is not initialized · Issue #131 · mapillary ...
I'm interested to know how the code is run: directly from the Python interpreter vs. ipython vs. a script launched with python script.py...
Read more >
RuntimeError: Default process group has not been initialized ...
I'm training the model with DistributedDataParallel and made weight file. Then trying to load the pth file with model and eval
Read more >
Error when using train.checkpoint - Ray
When I was trying to use the checkpoint in the ray train, I came across ... ERROR serialization.py:270 -- Default process group has...
Read more >
How to solve dist.init_process_group from hanging (or ...
... default distributed process group, and this will also initialize the distributed package. dist.init_process_group(backend, rank=rank, ...
Read more >
AssertionError: Default process group is not initialized
博主解决这个问题的方法为:如果项目里有分布式训练相关的代码,如果不使用分布式训练,就不要启动syncbn。
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found