Dev Observability
Product
Pricing
Docs
Resources
Blog
Company
Debug Wordle

question-mark

Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

The program stops at line `dist.init() `

See original GitHub issue

After I run CUDA_VISIBLE_DEVICES=1 torchpack dist-run -np 1 python train.py configs/semantic_kitti/spvcnn/cr0p5.yaml 2>&1 | tee ./train.log The program stops at line dist.init() in train.py and cannot continue to run.

Is there something wrong, could you please help me to solve this problem?

Enviromnet: cudatookit 10.2 pytorch 1.8.0 python 3.6 openmpi 4.1.1

Issue Analytics

State:
Created 2 years ago
Comments:8 (3 by maintainers)

Top GitHub Comments

1reaction

mtli77commented, Jun 2, 2021

The environment seems correct. Are you going to train the model with only 1 GPU?

Hi @zhijian-liu ,

Thanks for your replay!

I have changed the command as torchpack dist-run -np 3 python train.py configs/semantic_kitti/spvcnn/cr0p5.yaml 2>&1 | tee ./train.log

The program still stops at line dist.init() in train.py and cannot continue to run with only printed Failed to import tensorflow on the screen.

What is wrong with it?

0reactions

LeopoldACCcommented, Dec 1, 2021

I finally choose to not use the MPI.The MPI related constant parameter is setted as the context.py’s default value.

Read more comments on GitHub >

Top Results From Across the Web

How to solve dist.init_process_group from hanging (or ...

We have been using the environment variable initialization method throughout this tutorial.

How to solve dist.init_process_group from hanging ... - GitHub

The issue is that the MASTER_PORT env variable needs to be the same for all processes in the group. As you have it...

Distributed communication package - torch.distributed - PyTorch

The distributed package comes with a distributed key-value store, which can be used to share information between processes in the group as well...

After upgrade to Big Sur git stopped working - Apple Developer

Hit the same issue, needed to upgrade the command line tools version prior to running the xcode-select command. All worked fine afterwards. Posted...

Python3 pip3 install broken on Ubuntu

There is something wrong with your pip3 so remove it and reinstall it. Open the terminal and type: sudo apt purge python3-pip sudo...

Top Related Medium Post

No results found

Top Related StackOverflow Question

No results found

Troubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.

Top Related Reddit Thread

No results found

Top Related Hackernoon Post

No results found

Top Related Tweet

No results found

Top Related Dev.to Post

No results found

Top Related Hashnode Post

No results found

Question about linear quantization

Cuda out of memory