Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Is torch.distributed.all_reduce working as expected?

See original GitHub issue

This line https://github.com/facebookresearch/barlowtwins/blob/main/main.py#L208 use torch.distributed.all_reduce to sum the correlation matrices across all gpus. However as I know this op is not dedicated for forward computation where backward computation would run later. Instead, to apply “correctly differentiable” distributed all reduce, the official PyTorch document recommends using torch.distributed.nn.*: https://pytorch.org/docs/stable/distributed.html#autograd-enabled-communication-primitives

Issue Analytics

State:
Created 2 years ago
Reactions:1
Comments:8 (3 by maintainers)

Top GitHub Comments

1reaction

WarBeancommented, Mar 24, 2021

After doing some tests with differentiable allgather I realize that your implementation is an equivalent version. Very smart tricks.

0reactions

LinxiFancommented, May 21, 2021

I can confirm that torch.distributed.nn.all_reduce is mathematically incorrect: https://github.com/pytorch/pytorch/issues/58005 torch.distributed.all_reduce is correct, but seems to be by accident rather than by design.

Top Results From Across the Web

Distributed communication package - torch.distributed - PyTorch

The torch.distributed package provides PyTorch support and communication primitives for multiprocess parallelism across several computation nodes running on ...

Distributed.all_reduce bandwidth expectations

I want to benchmark how quickly PyTorch with the Gloo backend is able to all-reduce all-gather a model synchronously.

Too much time spent in `ncclKernel AllReduce`? - distributed

When distributing the training, is it expected that half of GPU time is spent on ncclKernel_AllReduce_RING_LL_Sum_float ? Below are more details ...

PyTorch Distributed Overview

Use torch.distributed.elastic to launch distributed training if errors (e.g., out-of-memory) are expected or if resources can join and leave dynamically ...

Distributed.all_reduce returns strange results - PyTorch Forums

Is the above error expected? How did you handle this? If this is handled by skipping/redoing that iteration, it might cause allreduce mismatch....