question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Multi-gpu is taking more time than single gpu

See original GitHub issue

I tried running the downstream ASR model using both single-gpu and multi-gpu (DDP) settings:

Single-gpu command: python3 run_downstream.py -m train -n asr_tera -u tera -d asr

Multi-gpu command:

distributed="-m torch.distributed.launch --nproc_per_node 4";
python3 $distributed run_downstream.py -m train -n asr_tera_ddp -u tera -d asr -o config.runner.gradient_accumulate_steps=2

However, the multi-gpu code is taking more time than the single-gpu one. Time taken by multi-gpu setting: ~6 days (using 4 gpus) Time taken by single-gpu setting: ~3 days (using 1 gpu)

Do you have any idea as to why is this happening? Have you tested the code using DDP?

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:5 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
Sindhu-Hegdecommented, Feb 2, 2022

Hi, here is the detailed config used in both the settings:

Single GPU:

num GPU=1
gradient_accumulate_steps=1
total_steps=200000
batch_size=12
test-clean wer=18.18

Multiple GPUs (DDP):

num GPU=2
gradient_accumulate_steps=2
total_steps=100000
batch_size=6
test-clean wer=20.01
0reactions
leo19941227commented, Apr 18, 2022

Hi,

Sorry for the late reply, and thanks for the detailed information! In my point of view, the following two settings should be equivalent:

num GPU=1
gradient_accumulate_steps=1
total_steps=200000
batch_size=12

vs.

num GPU=2
gradient_accumulate_steps=1
total_steps=200000
batch_size=6

or

num GPU=1
gradient_accumulate_steps=2
total_steps=100000
batch_size=12

vs.

num GPU=2
gradient_accumulate_steps=1
total_steps=100000
batch_size=12

The above two comparisons are equivalent. However, the comparison setting you share is not equivalent in my point of view, since the second one actually uses large effective batch size (24) and fewer steps (100000). In practice, usually we use larger batch size to get more accurate gradient so that the training converges faster and requires fewer steps. However, this does not mean that 2x batch size & 0.5x training steps is mathematically equivalent to 1x batch size & 1x training steps. In your case, it seems larger batch size does not help or even get worse WER. In my experience, our ASR setting won’t benefit from larger batch size which kind of aligning to your results.

Hence, the above result does not look very weird to me. Please feel free to point out my mistake if you think I am wrong. Thanks!

I am closing this issue for now. Feel free to re-open it!

Sincerely, Leo

Read more comments on GitHub >

github_iconTop Results From Across the Web

Multi system multi gpu distributed training slower than single ...
Try to run the network with random data generated in memory instead of reading from the disk. (Your drive is probably EBS which...
Read more >
Why My Multi-GPU training is slow? | by Chuan Li | Medium
The more GPUs you use, the bigger overhead they bring. This won't be a problem if the GPU computation cycle (one forward step...
Read more >
Multi GPU training slower than single GPU on Tensorflow
Multi GPU training slower than single GPU on Tensorflow ... I have created 3 virtual GPU's (have 1 GPU) and try to speedup...
Read more >
Multi-GPUs is slower than single GPU - OpenNMT Forum
I use 4 x GTX1080Ti 11GB GPUs. But I found it's even slower than only using one single GPU. My torch version is...
Read more >
Multiple GPU slower than single GPU or even CPU - MathWorks
Learn more about image processing, gpu, multi-gpu Image Processing Toolbox. ... Moreover, the single GPU and the CPU processing time is about the...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found