question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Delayed Parameter Update when step(wait=False)

See original GitHub issue

Is your feature request related to a problem? Please describe.

Eh, this could be a question. I’m trying to use TrainingAverager with step(wait=False). That requires data_lock and use_old_local_tensor=True follows.

When use_old_local_tensor=True, is it correct to simply add the weight difference between local model and all-reduced model to the new model parameters? The gradients calculated from the old model parameter is being added to the new model parameters. That doesn’t seem quite right.

Describe the solution you’d like

https://arxiv.org/abs/2101.06840 proposes Delayed Parameter Update. Parameter update is delayed by one step. Apparently, it makes little difference in the training curve if DPU is applied after 40 iterations in BERT-large training.

I think to implement DPU, you simply have to copy back the averaged tensor back to the model in the beginning of step().

Describe alternatives you’ve considered

I understand that if the weight difference is not added back, the local steps taken before the asynchronous all-reduce completes are being wasted. Not only it defeats purpose of asynchronous all-reduce(if local updates are going to be wasted until async completes, why not just go sync) but it also skips over input data which could trouble training.

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:5 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
justheuristiccommented, Dec 15, 2021

To whom it may concern: delayed parameter updates are enabled with hivemind.Optimizer via delay_grad_averaging=True, delay_optimizer_step=True .

Minimalistic example: benchmark_optimizer.py

More advanced usage examples (with full or partial DPU up to user’s discretion):

A more detailed API reference can be found here:

Currently, DPU requires installing hivemind from the github repo, i.e. pip install https://github.com/learning-at-home/hivemind/archive/master.zip

It will be available from PyPI after v1.0.0 is released, which is to say “sometime very soon”

If you have any other questions, feel free to open another issue or join our discord channel (link above)

0reactions
justheuristiccommented, Oct 30, 2021

when lr is decreased by lr scheulder(0.1 times in step-wise fashion) at 1.2kish steps, the training seems to be working

That might indeed be the case. In our DPU experiments, we enabled it early on, during the initial LR warmup, so the learning rate was still very small. That might have allowed DPU to phase in without significant performance drawdown.

p.s. i’ve finished the rest of my backlog yesterday, now working on making the DPU work in hivemind master. I’d still appreciate if you have time to chat a little to better coordinate our effort. (we can meet on discord or whichever other means of communication you prefer). Anyway, i’ll post updates to this thread as soon as i make any meaningful progress (within <=96h).

Read more comments on GitHub >

github_iconTop Results From Across the Web

arXiv:2101.06840v1 [cs.DC] 18 Jan 2021
and ii)One-step delayed parameter update that allows over- lapping the CPU optimizer step with GPU compute, while preserving accuracy.
Read more >
Delayed parameter update during the training process.
training schedule Figure 6 shows the workflow of ZeRO-Offload training process with delayed parameter update. – The first N −1 steps, are trained...
Read more >
SCCM CB 1806 Enable Bitlocker Task Failing on Windows 7 ...
Okay problem with Invalid command line argument '/full' in pre-provision Bitlocker step was fixed but same error i still left in end of...
Read more >
User guide — APScheduler 3.9.1 documentation
To run a job immediately, omit trigger argument when adding the job. ... If the execution of a job is delayed due to...
Read more >
How to determine when a process started with system(..., wait ...
I want a way to initiate the Java process and not wait for it to finish (using the wait=FALSE parameter), but have a...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found