Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Multiple GPU connected for training, but average utilization per GPU is lower than using single GPU

See original GitHub issue

Describe the bug Trained classification model with a single GPU, and saw average GPU utilization of ~40%.

I assumed using more GPU by increasing n_gpu in args can speed it up in proportional to the number of GPUs supplied.

In reality, the total GPU utilization remains the same at ~40%, but the average GPU utilization is ~7%.

Screenshots

Single GPU

Multiple GPUs

Desktop (please complete the following information): -OS 4.15.0-124-generic #127-Ubuntu SMP Fri Nov 6 10:54:43 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

Issue Analytics

State:
Created 3 years ago
Comments:5 (2 by maintainers)

Top GitHub Comments

3reactions

ThilinaRajapaksecommented, Apr 5, 2021

Sadly, this is probably because of issues with DataParallel from PyTorch. I am planning to upgrade to DistributedDataParallel, which should help. There’s no timeframe as of yet though.

1reaction

ThilinaRajapaksecommented, Mar 13, 2021

Your GPU utilization should go up if you increase the batch size. Right now, your GPUs are getting data starved.

Top Results From Across the Web

Efficient Training on Multiple GPUs - Hugging Face

When training on a single GPU is too slow or the model weights don't fit in a single GPUs memory we use a...

How to scale training on multiple GPUs - Towards Data Science

Each GPU does its forward pass, then the gradients are all-reduced across the GPUs.

Multi system multi gpu distributed training slower than single ...

Regarding the GPU utilization. On a single machine with 4-GPUS the GPU utilization is close to 80% (kvstore=device). If I use distributed setup...

Multi-GPU programming with CUDA. A complete ... - Medium

To solve this issue we need to abandon the single-thread multiple GPUs programming model. Let's assign each GPU to its own thread. By...

13.5. Training on Multiple GPUs - Dive into Deep Learning

For instance, rather than computing 64 channels on a single GPU we could ... In what follows we will use a toy network...

Troubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.

Start Free

Top Related Reddit Thread

No results found

Top Related Tweet

No results found

Top Related Dev.to Post

No results found

Multiple GPU connected for training, but average utilization per GPU is lower than using single GPU

Single GPU

Multiple GPUs

Issue Analytics

Top GitHub Comments

Top Results From Across the Web

Top Related Medium Post

Top Related StackOverflow Question

Troubleshoot Live Code

Top Related Reddit Thread

Top Related Hackernoon Post

Top Related Tweet

Top Related Dev.to Post

Top Related Hashnode Post

Include new dependency on simpletransformers/setup.py

Inability to reproduce results of simpletransformer article using electra on esperanto data¶