question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Regarding DDP and reversible networks

See original GitHub issue

Hi, I’m trying to figure out how to combine DDP with setting the network to be reversible.

My code basically looks like this:

import pytorch_lightning as pl
from performer_pytorch import Performer
...
model = nn.Sequential([...,Performer(...,reversible=True)])
trainer = pl.Trainer(...
                    distributed_backend='ddp',
                    ...)
trainer.fit(model,train_loader,val_loader)

Now all combinations work for me (ddp/not reversible, not ddp/reversible, not ddp/not reversible) except for ddp and reversible.

The error I get is:

RuntimeError: Expected to mark a variable ready only once. This error is caused by one of the following reasons:

  1. Use of a module parameter outside the forward function. Please make sure model parameters are not shared across multiple concurrent forward-backward passes
  2. Reused parameters in multiple reentrant backward passes. For example, if you use multiple checkpoint functions to wrap the same part of your model, it would result in the same set of parameters been used by different reentrant backward passes multiple times, and hence marking a variable ready multiple times. DDP does not support such use cases yet.

I’ve seen multiple people have similar issues: https://github.com/huggingface/transformers/issues/7160 ,https://github.com/pytorch/pytorch/issues/46166 , https://github.com/tatp22/linformer-pytorch/issues/23

Do you have any suggestion for how to deal with this issue? Im not really familiar with the inner workings of DDP and the autograd engine, so I’m not sure how to fix this myself.

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:11 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
Parskattcommented, Nov 2, 2020

@lucidrains Hi. Yeah I was cheating a bit, it’s wrapped in a lighting module (also where my optimizer etc is defined).

I saw Deepspeed as well, Ill give it a try! (tomorrow morning here in Sweden 😃 )

0reactions
Parskattcommented, Nov 3, 2020

Yes, everything is working (I think). Reversible works without any changes using DeepSpeed.

I had to do some major refactoring of my code though… But I guess thats from bad practice on my part 😉

I’ll close this for now.

Read more comments on GitHub >

github_iconTop Results From Across the Web

DDP: A Dynamic Dimensioning and Partitioning model of ...
In this paper, our focus is on the embedding problem which consists on the mapping of VN resources onto physical infrastructure network. More ......
Read more >
The Reversible Residual Network: Backpropagation Without ...
We present the Reversible Residual Network (RevNet), a variant of ResNets where each layer's activations can be reconstructed exactly from the ...
Read more >
dimethyl‐4H‐pyran (DDP) Dye with Amide
dye exhibits an electrochemically quasi reversible process rather than adsorbing on the electrode surface. Voltammetric Interaction of DDP ...
Read more >
An international research network for testing the Dahlem ...
... research network for testing the Dahlem Desertification Paradigm (DDP) ... and whether or not desertification is reversible.
Read more >
How to access app hosted on meteor.com by DDP ...
The websocket server is handled by sockjs, so as long as you use a standard wss it should 'just work' (see ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found