multi-gpu ddp calls validation and testing loops too many times
See original GitHub issueWhen using ddp with multiple gpus, each validation and test loop is called with the entire validation dataset for each gpu.
Expected behavior is that the dataset is divided appropriately across the gpus.
I am using current master (cloned Mar 14), Ubuntu 19.10, Cuda 10.1, python 3.7.5, pytorch 1.4, venv environment.
The problem appears to be in auto_add_sampler()
in data_loading.py. It does not create a DistributedSampler
for validation or test datasets.
Issue Analytics
- State:
- Created 4 years ago
- Comments:6 (6 by maintainers)
Top Results From Across the Web
multi-gpu ddp calls validation and testing loops too many times
When using ddp with multiple gpus, each validation and test loop is called with the entire validation dataset for each gpu.
Read more >From PyTorch DDP to Accelerate to Trainer, mastery of ...
This tutorial assumes you have a basic understanding of PyTorch and how to train a simple model. It will showcase training on multiple...
Read more >GPU training (Intermediate) - PyTorch Lightning - Read the Docs
This Lightning implementation of DDP calls your script under the hood multiple times with the correct environment variables: # example for 3 GPUs...
Read more >How distributed training works in Pytorch - AI Summer
In this tutorial, we will learn how to use nn.parallel.DistributedDataParallel for training our models in multiple GPUs.
Read more >Dope report – Weights & Biases - Wandb
Research often involves editing the boiler plate code with new ... out the main parts of the training loop and the validation loop...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
will do on both pr, and hash ref
Testing underway. Will make PR tomorrow.