Update examples/documentation for torch.distributed.run
See original GitHub issueMost examples now read local rank from a CLI argument --local_rank
. This ensures for it to work when using the torch launch utility torch.distributed.launch
. However, since recent torch versions, launch
is deprecated. The new suggested way to run distributed code from CLI is with torch.distributed.run
. The difference from before is that instead of automatically passing CLI arguments, it sets the rank as an environment variable. The examples therefore need to be updated to read the ENV variable instead of the CLI argument.
See https://pytorch.org/docs/stable/elastic/run.html#launcher-api for more.
Issue Analytics
- State:
- Created 2 years ago
- Comments:9 (4 by maintainers)
Top Results From Across the Web
Distributed communication package - torch.distributed - PyTorch
The torch.distributed package provides PyTorch support and communication primitives for multiprocess parallelism across several computation nodes running on ...
Read more >dist docs need an urgent serious update · Issue #60754 - GitHub
NOTE: This module is deprecated, use torch.distributed.run. why is the deprecated Launch utility doc hasn't been updated with what is supposed ...
Read more >Writing Distributed Applications with PyTorch
"""run.py:""" #!/usr/bin/env python import os import torch import torch.distributed as dist from torch.multiprocessing import Process def run(rank, ...
Read more >Distributed communication package - torch.distributed
(Note that Gloo currently runs slower than NCCL for GPUs.) CPU hosts with InfiniBand interconnect. If your InfiniBand has enabled IP over IB,...
Read more >Configuring distributed training for PyTorch | AI Platform Training
This document explains how to create a distributed PyTorch training job. ... Additionally, update your training code to use the torch.nn.parallel.
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Definitely. I think it’s a good idea to keep bumping this to open, so that others can also find this issue easily, and it can serve as a reminder that as soon as
launch
is obsolete, the docs need an update. It’s a tiny thing and easy to forget, so an open issue may help to remember as time goes on.I wouldn’t update the documentation as it won’t work for older versions of PyTorch and I would prefer to only put one command vs If your PyTorch version is this then xxx else yyy. Let’s wait a bit more 😃