question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

WarmupDecayLR.params.total_num_steps - total or per gpu?

See original GitHub issue

We are having a bit of hard time getting total_num_steps to pass to WarmupDecayLR at init time - it’s a bit too early for the logic as these points are configured once ddp/ds has started - we found a workaround, but it doesn’t take into the account the number of gpus.

I wanted to check that WarmupDecayLR.params.total_num_steps expects the total for the whole world and not per gpu.

Currently the doc says:

class WarmupDecayLR(WarmupLR):
[...]
            total_num_steps (int): total number of training steps

so it’d be awesome to disambiguate the context.

Thank you!

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:10 (10 by maintainers)

github_iconTop GitHub Comments

1reaction
stas00commented, Jan 7, 2021

It worked beautifully and greatly simplified our code - thank you, @jeffra

It’d be awesome to document it at https://www.deepspeed.ai/getting-started/#writing-deepspeed-models perhaps?

I’ve proposed this doc to reflect this important function https://github.com/microsoft/DeepSpeed/pull/644

1reaction
stas00commented, Jan 7, 2021

oh, you have recently added just that - fantastic - I will give it a try and report back. Thank you, @jeffra

Read more comments on GitHub >

github_iconTop Results From Across the Web

DeepSpeed Integration - Hugging Face
Gathering Parameters. Under ZeRO-3 on multiple GPUs no single GPU has all the parameters unless it's the parameters for the currently executing layer....
Read more >
Learning Rate Schedulers — DeepSpeed 0.8.0 documentation
Sets the learning rate of each parameter group according to learning rate range test (LRRT) policy. The policy increases learning rate starting from...
Read more >
DeepSpeed Configuration JSON
The maximum number of parameters resident per GPU before releasing. Smaller values use less memory, but perform more communication. 1e9 ...
Read more >
Training with multiple GPUs - NVIDIA Documentation Center
Horovod synchronizes the gradients across all processes at each training step. ... to set up the training parameters properly with multi-GPU training.
Read more >
Scalable, Low-cost Training of Massive Deep Learning Models
For example, Nvidia's NVLink connects 16 GPUs within a. DGX-2 server at 2.4 Tbps all-to-all bandwidth but the DGX-2 servers themselves are connected...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found