Investigate if model_without_ddp is needed
See original GitHub issue🐛 Describe the bug
Investigate if we need model_without_ddp
in the training script. https://github.com/pytorch/vision/blob/12fd3a625a044a454cca3dbb2187e78efe1b4596/references/classification/train.py#L201
Versions
N/A
cc @datumbox
Issue Analytics
- State:
- Created 2 years ago
- Reactions:2
- Comments:9 (9 by maintainers)
Top Results From Across the Web
The population dynamical consequences of density ... - CORE
(1993) model was developed to investigate the possible role of pathogens in the cyclic dynamics of forest insect pests. We consider a system....
Read more >(PDF) The population dynamical consequences of density ...
We develop a general host-pathogen model and assess the role of DDP on the population dynamics. The ability of DDP to drive population...
Read more >Investigate if model_without_ddp is needed - Giters
While running a fairly limited training experiment for #4381, it was observed that model worked without any issue instead of model_without_ddp .
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
@fmassa Thanks for providing background on why this was added.
So basically, this workaround increases user-friendliness on how the weights are handled after the training is completed (hence outside of the train.py script).
Two thoughts on eliminating the non-parallelized version:
.module
without real issues..module
with the aforementioned method and then use the weights.I don’t have a very strong opinion over this, but I’m leaning towards keeping it for the time being. Yes it’s a bit annoying to keep the non-parallelized version around but it does eliminate potential frustration for new users of the library. Thoughts?
@prabhat00155 what you would have needed would be to try to load the serialized checkpoint from a model that hasn’t been wrapped up in DDP yet.
Something as simple as
would fail due to the added
module.
that gets appended due to DDP, so you would need to use tools liketorch.nn.modules.utils.consume_prefix_in_state_dict_if_present
, which are pretty new and was added to PyTorch less than 6 months ago https://github.com/pytorch/pytorch/pull/53224I would be ok removing the current
model_without_ddp
in torchvision if we use a newer and better way that is provided by PyTorch, but I’m not sure that the currenttorch.nn.modules.utils.consume_prefix_in_state_dict_if_present
is enough for that (at least it would need some thinking to be able to make sure all cases are handled properly)