question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

About multi-gpu loss calculation

See original GitHub issue

Thanks for your nice work! I notice there is a mean() when the program runs on multi-gpus, but there is not any gather-operation. In other words, the loss in
https://github.com/microsoft/UniVL/blob/0a7c07f566a3b220731f4abcaa6e1ee59a686596/main_pretrain.py#L332 is a scale but not a list of tensor. Am I right?

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:10 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
ArrowLuocommented, Jul 9, 2021

For your reference, 0.13->0.02 and 0.12->0.09 at the two stages. They are not so exact due to the bad log caused by the machines’ problem. One more time, the convergent is more important.

1reaction
ArrowLuocommented, Jul 9, 2021

Hi @forence, You are right. I am confused with torch.nn.DataParallel and torch.nn.parallel.DistributedDataParallel. Thank you to point it out. The mean() is indeed redundant in our code. Thanks.

Read more comments on GitHub >

github_iconTop Results From Across the Web

PyTorch Multi GPU: 3 Techniques Explained - Run:AI
There are three main ways to use PyTorch with multiple GPUs. These are: ... averages GPU-losses and performs a backward pass loss.mean().backward()
Read more >
Efficient Training on Multiple GPUs - Hugging Face
To calculate the global batch size of the DP + PP setup we then do: mbs*chunks*dp_degree ( 8*32*4=1024 ). Let's go back to...
Read more >
13.5. Training on Multiple GPUs - Dive into Deep Learning
Multiple GPUs, after all, increase both memory and computation ability. ... Each GPU calculates loss and gradient of the model parameters based on...
Read more >
When calculate loss in model forward with multi-gpu training ...
Hi everyone, when I use F.nn_loss() in model forward as above. Then I two GPUs to train the model in form of model...
Read more >
How to scale training on multiple GPUs - Towards Data Science
The loss function is calculated, comparing the predicted label with the ground-truth label; The backward pass is done, calculating the gradients ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found