Seeking suggestions for embedding into ddp
See original GitHub issueHi, PyTorch 1.8 have this new hook torch.nn.parallel.DistributedDataParallel.register_comm_hook()
, any advices on how to integrate grace into ddp using the dist examples?
Issue Analytics
- State:
- Created 2 years ago
- Comments:7 (4 by maintainers)
Top Results From Across the Web
How to Make DDP Files for Free - James Z. Productions
In Finder or File Explorer, select all the Wave files that are to be printed to CDs, drag them into REAPER, and drop...
Read more >EMBEDDING INTEGRATED PRODUCT DEVELOPMENT ...
Alternatives that meet all, or the most important, performance criteria can then be evaluated based on estimations and predictions of DDP values, along...
Read more >Dyadic Developmental Psychotherapy (DDP)
Adults of all ages also tend to seek their attachment figure – partner, best friend, parent, sibling – when they experi- ence a...
Read more >Feedback systems in the design and development process
The process model integrates ideas from some previously distinct perspectives to indicate how ... Feedback and goal-seeking in DDP activity.
Read more >How can I create an inset map in DDP that zooms to a feature ...
In your second data frame you can go to its Properties and on the Data Frame tab there is an Extent dropdown which...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
dist.all_reduce
simply sums up the dense inputs across the nodes. It doesn’t support value-index pair sparse tensor format. To perform allreduce for sparse tensors, you need to use allgather to collect all sparse tensors and cast them into dense format locally, then sum them up. In grace, we have developed allgather for tensors with different length, please check here.Thank you very much. Your implementation is really good. I do have the following 2 suggestions for your optimization:
DgcCompressor.decompress()
, there is aFor Loop
to cast the sparse tensor into dense format, which can be very expensive in case of large gradients. You may want to use this api scatter_add to gain some speed.Regarding the poor performance, are you comparing with GRACE DGC or the no compression baseline? And also please note that, gradient compression is not always beneficial due to various model architectures, network conditions, and different number of nodes. Could you please specify your testing environment?