question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

DeeperGCN can not use DistributedDataParallel?

See original GitHub issue

🐛 Describe the bug

when i train my model on one gpu,is ok.but when i use Multi-GPU model,if will break.[num_layers = 2,if set one ,it will not wrong] image

Environment

  • PyG version:
  • PyTorch version:
  • OS:
  • Python version:
  • CUDA/cuDNN version:
  • How you installed PyTorch and PyG (conda, pip, source):
  • Any other relevant information (e.g., version of torch-scatter):

Issue Analytics

  • State:open
  • Created a year ago
  • Comments:6 (4 by maintainers)

github_iconTop GitHub Comments

2reactions
rusty1scommented, Sep 23, 2022

Thanks for the detailed information. My current understanding is that this is still a limitation of using torch.checkpoint here, I am not really sure how to resolve this on our end. Hopefully @lightaime can shed more light on this.

0reactions
lightaimecommented, Sep 23, 2022

@WeiLong-Zh Thanks for reporting this issue! I will take a look at it. It would be helpful if you can provide a script to reproduce this issue.

Read more comments on GitHub >

github_iconTop Results From Across the Web

RuntimeError when using multiple DistributedDataParallel ...
This error indicates that your module has parameters that were not used in producing loss. You can enable unused parameter detection by (1) ......
Read more >
Getting Started with Distributed Data Parallel - PyTorch
This tutorial starts from a basic DDP use case and then demonstrates more ... DistributedDataParallel works with model parallel; DataParallel does not at ......
Read more >
Is DGL compatible with DDP (Distributed Data Parallel)?
Hi,. I am new to using GNNs. I already have a working code base with DDP and was hoping I could re-use it....
Read more >
Distributed data parallel training in Pytorch
DistributedDataParallel . However, it doesn't give a high-level overview of what it does and provides no insight on how to use it.
Read more >
How to use Distributed Data Parallel properly in pytorch
I think you cannot initialize the model in DDP on one GPU when each process needs to share this GPU device. Share.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found