Multiple GPU connected for training, but average utilization per GPU is lower than using single GPU
See original GitHub issueDescribe the bug Trained classification model with a single GPU, and saw average GPU utilization of ~40%.
I assumed using more GPU by increasing n_gpu
in args
can speed it up in proportional to the number of GPUs supplied.
In reality, the total GPU utilization remains the same at ~40%, but the average GPU utilization is ~7%.
Screenshots
Single GPU
Multiple GPUs
Desktop (please complete the following information):
-OS
4.15.0-124-generic #127-Ubuntu SMP Fri Nov 6 10:54:43 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
Issue Analytics
- State:
- Created 3 years ago
- Comments:5 (2 by maintainers)
Top Results From Across the Web
Efficient Training on Multiple GPUs - Hugging Face
When training on a single GPU is too slow or the model weights don't fit in a single GPUs memory we use a...
Read more >How to scale training on multiple GPUs - Towards Data Science
Each GPU does its forward pass, then the gradients are all-reduced across the GPUs.
Read more >Multi system multi gpu distributed training slower than single ...
Regarding the GPU utilization. On a single machine with 4-GPUS the GPU utilization is close to 80% (kvstore=device). If I use distributed setup...
Read more >Multi-GPU programming with CUDA. A complete ... - Medium
To solve this issue we need to abandon the single-thread multiple GPUs programming model. Let's assign each GPU to its own thread. By...
Read more >13.5. Training on Multiple GPUs - Dive into Deep Learning
For instance, rather than computing 64 channels on a single GPU we could ... In what follows we will use a toy network...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Sadly, this is probably because of issues with DataParallel from PyTorch. I am planning to upgrade to DistributedDataParallel, which should help. There’s no timeframe as of yet though.
Your GPU utilization should go up if you increase the batch size. Right now, your GPUs are getting data starved.