DeepSpeed support for ignite.distributed
See original GitHub issue🚀 Feature
Pytorch lightning recently added native support for MS DeepSpeed.
I believe it is also helpful for users if ignite incorporates the DeepSpeed pipeline for memory-efficient distributed training.
1. for idist.auto_model …?
To initialize the DeepSpeed engine:
model_engine, optimizer, _, _ = deepspeed.initialize(args=cmd_args,
model=model,
model_parameters=params)
And for distributed environment setup, we need to replace torch.distributed.init_process_group(...)
to deepspeed.init_distributed()
2. checkpoint handler
slightly different thing for checkpointing
model_engine.save_checkpoint(args.save_dir, ckpt_id, client_sd = client_sd)
Issue Analytics
- State:
- Created 2 years ago
- Reactions:2
- Comments:6 (4 by maintainers)
Top Results From Across the Web
Getting Started - DeepSpeed
If you prefer to launch your training job using MPI (e.g., mpirun), we provide support for this. It should be noted that DeepSpeed...
Read more >pytorch/ignite v0.4.2 on GitHub - NewReleases.io
New release pytorch/ignite version v0.4.2 Improved distributed support (horovod framework, epoch-wise metrics, etc), new metrics/handlers, bug fixes and ...
Read more >PyTorch Lightning vs Ignite: What Are the Differences?
Distributed training is supported in Ignite, but it has to be configured accordingly by the user, which can take a lot of effort....
Read more >pytorch-ignite on Twitter: "In addition, we are starting to provide pre ...
PyTorch-Ignite v0.4.2 is available with improved distributed support (horovod framework, epoch-wise metrics, etc), new metrics/handlers, bug fixes and ...
Read more >ignite.distributed — PyTorch-Ignite v0.4.10 Documentation
We provide a context manager to simplify the code of distributed configuration setup for all above supported backends.
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
@Kashu7100 thanks for the feature request!
Yes, we plan to improve our support of deepspeed framework which is roughly:
Our idea was to provide basic integration examples of how to use ignite and deepspeed together. I looked at it multiple times and due to certain overlap between the framework it was not obvious where to put the split.
@sdesrozis I’m not sure whether we should add it as a new backend or not. Let’s first create basic integration example and see which part of DeepSpeed code could be simplified using
idist
.@Kashu7100 Finally, introducing a new backend does not seem to be the good option. Have a look here, and you would see that native PyTorch distributed is used when distributed environment variables are set.
That is a good news for simple use cases.
I would say yes.