Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

how to use all_gather in training loop?

See original GitHub issue

I have defined my train_step in the exact same way as in the cifar10 example. Is it possible to gather all of the predictions before computing the loss? I haven’t seen examples of this pattern in the ignite examples (maybe I’m missing it?), but for my application, it is more optimal to compute the loss after aggregating the forward passes and targets run on multiple GPU’s. This only matters when using DistributedDataParallel, since DataParallel automatically aggregates the outputs.

I see the idist.all_gather() function, but am unclear how to use it in a training loop.

Issue Analytics

State:
Created 2 years ago
Comments:10 (5 by maintainers)

Top GitHub Comments

2reactions

sdesroziscommented, Mar 8, 2022

Ok I understand. You should have a look to a distributed implementation of SimCLR. See for instance

https://github.com/Spijkervet/SimCLR/blob/cd85c4366d2e6ac1b0a16798b76ac0a2c8a94e58/simclr/modules/nt_xent.py#L7

This might give you some inspiration.

0reactions

kkarrancsucommented, Mar 8, 2022

Hi @vfdev-5, sure.

We are using the Supervised Contrastive loss to train an embedding. In Eq. 2 of the paper, we see that the loss depends on the number of samples used to compute it (positive and negative).

My colleague suggested to me that it is more optimal to compute the loss considering all examples (the entire batch), rather than considering batch/ngpu samples (which is what would happen when using DDP and computing loss locally to each GPU). This is because the denominator in SupConLoss is computing the loss of negative samples, and by first aggregating all of the negative samples across gpus, you would get a more accurate loss.

Top Results From Across the Web

MPI Scatter, Gather, and Allgather - MPI Tutorial

Although the program is quite simple, it demonstrates how one can use MPI to divide work across processes, perform computation on subsets of...

API - Horovod documentation - Read the Docs

name – Optional name to use during allgather, will default to the class type. ... Optional custom function to execute within the training...

Distributed Training - Determined AI Documentation

In the training code, use the allgather primitive to do a “distributed” ... The rule also applies to the conditional save after the...

Operation Semantics | XLA - TensorFlow

AfterAll; AllGather; AllReduce; AllToAll; BatchNormGrad ... infed causes the while loop to iterate more times on one replica than another.

Communication of receiver-selective data using MPI_Allgather ...

FIGURE 6 | Communication of spike data using MPI_Alltoall for the. ... fast re-configuration for hardware-in-the-loop training, applications for the ...