Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Why use torch.multiprocessing.spawn for distributed training

See original GitHub issue

Hi there,

In the Swin UNETR scripts, e.g., https://github.com/Project-MONAI/research-contributions/blob/main/SwinUNETR/BRATS21/main.py, torch.multiprocessing.spawn is used for launching distributed training. Any reason why you didn’t use torch.distributed.launch? Did torch.multiprocessing.spawn give better performance than torch.distributed.launch for BraTS/BTCV-based Swin UNETR training?

Thanks!

Issue Analytics

State:
Created a year ago
Comments:9 (5 by maintainers)

Top GitHub Comments

1reaction

tangy5commented, Dec 8, 2022

Thank you for clarification. Here are initial logs. single GPU, batch_size=1

2 GPUs, batch_size=2

multi GPU keeps taking longer time as number of GPUs increases. It will be worse if running with batch_size=1 on multi GPUs.

I mean, yes, when training with single GPU, the batch size is 1, then train on 2 GPUs, batch size is 2, the time is expected to be longer but should be less than 2 x time of Single GPU for each step/iteration. You could see 2 GPUs training is faster here, but is not exactly 2x faster, it’s ~1.7x faster.

1reaction

tangy5commented, Sep 2, 2022

Hi @hw-ju , the SwinUNETR is tested of multi-GPU training with both DDP and MP Spawn. Both works well, no performance preference regarding different multi-GPU frameworks. You can safely use DDP. Thank you!

Top Results From Across the Web

Torch.distributed.launch vs torch.multiprocessing.spawn

If you need multi-server distributed data parallel training, it might be more convenient to use torch.distributed.launch as it automatically ...

Why using mp.spawn is slower than using torch.distributed ...

mp.spawn is usually slower due to initialization overhead. In general distributed training is long running, so usually the initialization time ...

Distributed Computing with PyTorch - Shiv Gehlot

Hence, “torch.multiprocessing.spawn” can be used to spawn the training function “fn(”) on each of the GPU through “args”.

Writing Distributed Applications with PyTorch

torch.distributed ) enables researchers and practitioners to easily parallelize their computations across processes and clusters of machines. · torch.

Distributed Training Made Easy with PyTorch-Ignite

Then we will also cover several ways of spawning processes via torch native torch.multiprocessing.spawn and also via multiple distributed ...