question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

DDLRUN + DeepSpeed on SUMMIT

See original GitHub issue

Hi,

I am trying to use deepspeed on SUMMIT using ddlrun, but it doesn’t work properly. I am testing it with cifar like: ddlrun deepspeed cifar10_deepspeed.py --deepspeed --deepspeed_config ds_config.json

Could you please give us an example for using deepspeed with horovod , mpi and ddlrun ?

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:7 (2 by maintainers)

github_iconTop GitHub Comments

1reaction
agemagiciancommented, Feb 27, 2020

Thanks @jeffra for the update. I will test it and I will give you my feedback.

0reactions
agemagiciancommented, Feb 28, 2020

Oh, that was actually for using Megatron-LM code, which doesn’t use DeepSpeed distributed code.

I will test it again with the cifar test.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Distributed Deep Learning on Summit | OLCF
DDL provides a utility called DDLRUN which is used to launch the learning job on any number of nodes/gpus. Data Parallel Distributed Deep ......
Read more >
Enabling Efficient Inference of Transformer Models at ... - arXiv
DeepSpeed Inference reduces latency by up to 7.3× over ... In addition, for large models, even the peak memory band-.
Read more >
Fit More and Train Faster With ZeRO via DeepSpeed and ...
The new --sharded_ddp and --deepspeed command line Trainer arguments provide FairScale and DeepSpeed integration respectively. Here is the full ...
Read more >
Using ddlrun tool - IBM
This tool performs the following tasks automatically: Determines the necessary arguments to pass to MPI based on the current environment and version of...
Read more >
Top 6 alternatives to Microsoft's DeepSpeed
Microsoft's DeepSpeed was introduced in 2020 and is one of the most popular deep learning ... Machine Learning Developers Summit (MLDS) 2023
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found