How to use l2l.algorithms.MAML correctly with nn.DistributedDataParallel?
See original GitHub issueThis work is awesome!
Using nn.DistributedDataParallel
in the following way will raise Error when execute learner = maml.clone()
How to use it correctly? Should I use nn.DistributedDataParallel on MyModel and then use MAML?
Thanks!
model = MyModel()
maml = l2l.algorithms.MAML(model, lr=0.5)
model = nn.DistributedDataPrallel(model, device_ids=[rank])
…
learner = maml.clone()
Issue Analytics
- State:
- Created 3 years ago
- Comments:13 (5 by maintainers)
Top Results From Across the Web
Does it support the model wrapped by DistributedDataParallel?
Correct, there are 2 ways to go about distributed MAML: either use DDP + LightningMAML, or use torch.distributed + cherry. Choose the one...
Read more >learn2learn.algorithms
MAML (BaseLearner) ¶ ... High-level implementation of Model-Agnostic Meta-Learning. This class wraps an arbitrary nn.Module and augments it with clone() and adapt ...
Read more >Distributed data parallel training in Pytorch
Pytorch provides a tutorial on distributed training using AWS, ... However, it doesn't have code examples of how to use nn.DataParallel .
Read more >Making Meta-Learning Easily Accessible on PyTorch - Medium
The goal of a meta-learning algorithm is to use training experience to ... Model Agnostic Meta Learning (MAML) is a popular gradient-based ...
Read more >How distributed training works in Pytorch - AI Summer
In this tutorial, we will learn how to use nn.parallel.DistributedDataParallel for training our models in multiple GPUs.
Read more >Top Related Medium Post
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Hello @AyanamiReiFan, and thanks for the kind words.
Parallelizing MAML with
DistributedDataParallel
is a bit tricky as the implementation relies on gradient hooks which don’t play well withclone/grad
. If you want to usetorch.distributed
to parallelize the training loop, cherry’s Distributed optimizer is another option:If you want to parallelize the model over GPUs, I would use
torch.nn.DataParallel
:Let me know if you ever find a solution to using
DistributedDataParallel
, I’d be curious to know the solution.@Kulbear