question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

how to use all_gather in training loop?

See original GitHub issue

I have defined my train_step in the exact same way as in the cifar10 example. Is it possible to gather all of the predictions before computing the loss? I haven’t seen examples of this pattern in the ignite examples (maybe I’m missing it?), but for my application, it is more optimal to compute the loss after aggregating the forward passes and targets run on multiple GPU’s. This only matters when using DistributedDataParallel, since DataParallel automatically aggregates the outputs.

I see the idist.all_gather() function, but am unclear how to use it in a training loop.

Issue Analytics

  • State:open
  • Created 2 years ago
  • Comments:10 (5 by maintainers)

github_iconTop GitHub Comments

2reactions
sdesroziscommented, Mar 8, 2022

Ok I understand. You should have a look to a distributed implementation of SimCLR. See for instance

https://github.com/Spijkervet/SimCLR/blob/cd85c4366d2e6ac1b0a16798b76ac0a2c8a94e58/simclr/modules/nt_xent.py#L7

This might give you some inspiration.

0reactions
kkarrancsucommented, Mar 8, 2022

Hi @vfdev-5, sure.

We are using the Supervised Contrastive loss to train an embedding. In Eq. 2 of the paper, we see that the loss depends on the number of samples used to compute it (positive and negative).

My colleague suggested to me that it is more optimal to compute the loss considering all examples (the entire batch), rather than considering batch/ngpu samples (which is what would happen when using DDP and computing loss locally to each GPU). This is because the denominator in SupConLoss is computing the loss of negative samples, and by first aggregating all of the negative samples across gpus, you would get a more accurate loss.

Read more comments on GitHub >

github_iconTop Results From Across the Web

MPI Scatter, Gather, and Allgather - MPI Tutorial
Although the program is quite simple, it demonstrates how one can use MPI to divide work across processes, perform computation on subsets of...
Read more >
API - Horovod documentation - Read the Docs
name – Optional name to use during allgather, will default to the class type. ... Optional custom function to execute within the training...
Read more >
Distributed Training - Determined AI Documentation
In the training code, use the allgather primitive to do a “distributed” ... The rule also applies to the conditional save after the...
Read more >
Operation Semantics | XLA - TensorFlow
AfterAll; AllGather; AllReduce; AllToAll; BatchNormGrad ... infed causes the while loop to iterate more times on one replica than another.
Read more >
Communication of receiver-selective data using MPI_Allgather ...
FIGURE 6 | Communication of spike data using MPI_Alltoall for the. ... fast re-configuration for hardware-in-the-loop training, applications for the ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found