question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Is torch.distributed.all_reduce working as expected?

See original GitHub issue

This line https://github.com/facebookresearch/barlowtwins/blob/main/main.py#L208 use torch.distributed.all_reduce to sum the correlation matrices across all gpus. However as I know this op is not dedicated for forward computation where backward computation would run later. Instead, to apply “correctly differentiable” distributed all reduce, the official PyTorch document recommends using torch.distributed.nn.*: https://pytorch.org/docs/stable/distributed.html#autograd-enabled-communication-primitives

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Reactions:1
  • Comments:8 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
WarBeancommented, Mar 24, 2021

After doing some tests with differentiable allgather I realize that your implementation is an equivalent version. Very smart tricks.

0reactions
LinxiFancommented, May 21, 2021

I can confirm that torch.distributed.nn.all_reduce is mathematically incorrect: https://github.com/pytorch/pytorch/issues/58005 torch.distributed.all_reduce is correct, but seems to be by accident rather than by design.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Distributed communication package - torch.distributed - PyTorch
The torch.distributed package provides PyTorch support and communication primitives for multiprocess parallelism across several computation nodes running on ...
Read more >
Distributed.all_reduce bandwidth expectations
I want to benchmark how quickly PyTorch with the Gloo backend is able to all-reduce all-gather a model synchronously.
Read more >
Too much time spent in `ncclKernel AllReduce`? - distributed
When distributing the training, is it expected that half of GPU time is spent on ncclKernel_AllReduce_RING_LL_Sum_float ? Below are more details ...
Read more >
PyTorch Distributed Overview
Use torch.distributed.elastic to launch distributed training if errors (e.g., out-of-memory) are expected or if resources can join and leave dynamically ...
Read more >
Distributed.all_reduce returns strange results - PyTorch Forums
Is the above error expected? How did you handle this? If this is handled by skipping/redoing that iteration, it might cause allreduce mismatch....
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found