Regarding DDP and reversible networks
See original GitHub issueHi, I’m trying to figure out how to combine DDP with setting the network to be reversible.
My code basically looks like this:
import pytorch_lightning as pl
from performer_pytorch import Performer
...
model = nn.Sequential([...,Performer(...,reversible=True)])
trainer = pl.Trainer(...
distributed_backend='ddp',
...)
trainer.fit(model,train_loader,val_loader)
Now all combinations work for me (ddp/not reversible, not ddp/reversible, not ddp/not reversible) except for ddp and reversible.
The error I get is:
RuntimeError: Expected to mark a variable ready only once. This error is caused by one of the following reasons:
- Use of a module parameter outside the
forward
function. Please make sure model parameters are not shared across multiple concurrent forward-backward passes- Reused parameters in multiple reentrant backward passes. For example, if you use multiple
checkpoint
functions to wrap the same part of your model, it would result in the same set of parameters been used by different reentrant backward passes multiple times, and hence marking a variable ready multiple times. DDP does not support such use cases yet.
I’ve seen multiple people have similar issues: https://github.com/huggingface/transformers/issues/7160 ,https://github.com/pytorch/pytorch/issues/46166 , https://github.com/tatp22/linformer-pytorch/issues/23
Do you have any suggestion for how to deal with this issue? Im not really familiar with the inner workings of DDP and the autograd engine, so I’m not sure how to fix this myself.
Issue Analytics
- State:
- Created 3 years ago
- Comments:11 (4 by maintainers)
Top GitHub Comments
@lucidrains Hi. Yeah I was cheating a bit, it’s wrapped in a lighting module (also where my optimizer etc is defined).
I saw Deepspeed as well, Ill give it a try! (tomorrow morning here in Sweden 😃 )
Yes, everything is working (I think). Reversible works without any changes using DeepSpeed.
I had to do some major refactoring of my code though… But I guess thats from bad practice on my part 😉
I’ll close this for now.