Consider changing the semantics of `Module.setup()` to be called immediate during init rather than when a module is bound
See original GitHub issueRight now, we tell users that setup
acts like __init__
, but this isn’t completely true as setup
is only called when a Module is bound – that is – when it has variables. Therefore, the following code doesn’t work:
class ResNet(nn.Module):
def setup(self):
self.backbone = Backbone()
self.classifier = Classifier()
ResNet().backbone # not available
A workaround is to expose the backbone as a method, e.g.
class ResNet(nn.Module):
def make_backbone():
return Backbone()
def setup(self):
self.backbone = self.make_backbone()
self.classifier = self.make_classifier()
ResNet().make_backbone()
But this is quite confusing.
The proposal here, in short, is to make setup
actually fire immediately during module construction, but defer variable from being actually present until the module is bound. This means that accessing the value of parameters defined during setup
would throw an error.
There are some open design questions about this proposal, but I think it would simplify transfer learning use-cases, as well as simplifying the mental model around setup
.
@jheek and I have discussed this, and we may consider investigating this soon. I believe the change should mostly not impact user code, though in some cases modules that define variables during setup() and access their value may have to be thought about more carefully.
Issue Analytics
- State:
- Created 3 years ago
- Reactions:3
- Comments:8 (1 by maintainers)
Top GitHub Comments
Actually, historically we went in the opposite direction of calling
setup()
lazily. We were forced to do this in order to avoid exponential-time blowup via double-recursion which occurs with any eager initialization scheme in the presence of nested transformations (which currently frequently occur when for instance using JAX’snamed_call
machinery to label profiling traces). As such, I’m closing this old issue.@shashank2000 - I actually have no idea what that sentence is even talking about - I’ll have to figure out what the author meant by that and remove or redact that statement. It would be better to open a new issue if you have a specific question about initialization or writing a model.
Does this issue have anything to do with the initialization of the “last” submodule like here? (The line right above the header linked to)
I’m not sure how to approach this.