Dev Observability
Product
Pricing
Docs
Resources
Blog
Company
Debug Wordle

question-mark

Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

About multi-gpu loss calculation

See original GitHub issue

Thanks for your nice work! I notice there is a mean() when the program runs on multi-gpus, but there is not any gather-operation. In other words, the loss in
https://github.com/microsoft/UniVL/blob/0a7c07f566a3b220731f4abcaa6e1ee59a686596/main_pretrain.py#L332 is a scale but not a list of tensor. Am I right?

Issue Analytics

State:
Created 2 years ago
Comments:10 (5 by maintainers)

Top GitHub Comments

1reaction

ArrowLuocommented, Jul 9, 2021

For your reference, 0.13->0.02 and 0.12->0.09 at the two stages. They are not so exact due to the bad log caused by the machines’ problem. One more time, the convergent is more important.

1reaction

ArrowLuocommented, Jul 9, 2021

Hi @forence, You are right. I am confused with torch.nn.DataParallel and torch.nn.parallel.DistributedDataParallel. Thank you to point it out. The mean() is indeed redundant in our code. Thanks.

Read more comments on GitHub >

Top Results From Across the Web

PyTorch Multi GPU: 3 Techniques Explained - Run:AI

There are three main ways to use PyTorch with multiple GPUs. These are: ... averages GPU-losses and performs a backward pass loss.mean().backward()

Efficient Training on Multiple GPUs - Hugging Face

To calculate the global batch size of the DP + PP setup we then do: mbs*chunks*dp_degree ( 8*32*4=1024 ). Let's go back to...

13.5. Training on Multiple GPUs - Dive into Deep Learning

Multiple GPUs, after all, increase both memory and computation ability. ... Each GPU calculates loss and gradient of the model parameters based on...

When calculate loss in model forward with multi-gpu training ...

Hi everyone, when I use F.nn_loss() in model forward as above. Then I two GPUs to train the model in form of model...

How to scale training on multiple GPUs - Towards Data Science

The loss function is calculated, comparing the predicted label with the ground-truth label; The backward pass is done, calculating the gradients ...

Top Related Medium Post

No results found

Top Related StackOverflow Question

No results found

Troubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.

Top Related Reddit Thread

No results found

Top Related Hackernoon Post

No results found

Top Related Tweet

No results found

Top Related Dev.to Post

No results found

Top Related Hashnode Post

No results found

JS ternary expression not colored correctly

Certificate expired for https://vfsforgit.org/