question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

How to run '-m torch.distributed.launch' if I do training in Jupiter notebook

See original GitHub issue

❓ Questions and Help

The notebook is having errors when I give gpu number more than 2.

ValueError: Error initializing torch.distributed using env:// rendezvous: environment variable RANK expected, but not set

This error seems from not have -m torch.distributed.launch; But how to make it run in jupyter notebook?

Thanks!

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Comments:6 (2 by maintainers)

github_iconTop GitHub Comments

1reaction
YubinXiecommented, Mar 12, 2019

Got it. Thanks!

0reactions
YubinXiecommented, Mar 20, 2019

Well, if you did not read the above conversion, the take-home message is: no multiple gpu training in notebook; you can only use multiple gpu with given python scripts.

Read more comments on GitHub >

github_iconTop Results From Across the Web

How to launch a distributed training | fastai
In your terminal, type the following line (adapt num_gpus and script_name to the number of GPUs you want to use and your script...
Read more >
Multi node training with PyTorch DDP, torch.distributed.launch ...
This video goes over how to perform multi node distributed training with PyTorch DDP. Based on the blog post:"Multi-node PyTorch Distributed ...
Read more >
Torch.distributed.launch hanged - PyTorch Forums
Hi,. I am trying to leverage parallelism with distributed training but my process seems to be hanging or getting into 'deadlock' sort of...
Read more >
Launching Multi-Node Training from a Jupyter Environment
Now you can build the training loop. notebook_launcher() works by passing in a function to call that will be ran across the distributed...
Read more >
Multi node PyTorch Distributed Training Guide For People In A ...
Although the above torch.distributed.launch method works "out of the box" as the native PyTorch API, one has to modify and run the launch ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found