Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

DeeperGCN can not use DistributedDataParallel?

See original GitHub issue

🐛 Describe the bug

when i train my model on one gpu,is ok.but when i use Multi-GPU model,if will break.[num_layers = 2,if set one ,it will not wrong]

Environment

PyG version:
PyTorch version:
OS:
Python version:
CUDA/cuDNN version:
How you installed PyTorch and PyG (conda, pip, source):
Any other relevant information (e.g., version of torch-scatter):

Issue Analytics

State:
Created a year ago
Comments:6 (4 by maintainers)

Top GitHub Comments

2reactions

rusty1scommented, Sep 23, 2022

Thanks for the detailed information. My current understanding is that this is still a limitation of using torch.checkpoint here, I am not really sure how to resolve this on our end. Hopefully @lightaime can shed more light on this.

0reactions

lightaimecommented, Sep 23, 2022

@WeiLong-Zh Thanks for reporting this issue! I will take a look at it. It would be helpful if you can provide a script to reproduce this issue.

Top Results From Across the Web

RuntimeError when using multiple DistributedDataParallel ...

This error indicates that your module has parameters that were not used in producing loss. You can enable unused parameter detection by (1) ......

Getting Started with Distributed Data Parallel - PyTorch

This tutorial starts from a basic DDP use case and then demonstrates more ... DistributedDataParallel works with model parallel; DataParallel does not at ......

Is DGL compatible with DDP (Distributed Data Parallel)?

Hi,. I am new to using GNNs. I already have a working code base with DDP and was hoping I could re-use it....

Distributed data parallel training in Pytorch

DistributedDataParallel . However, it doesn't give a high-level overview of what it does and provides no insight on how to use it.

How to use Distributed Data Parallel properly in pytorch

I think you cannot initialize the model in DDP on one GPU when each process needs to share this GPU device. Share.