question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

ZeRO 3 example does not run

See original GitHub issue

The ZeRO 3 example does not run. The main problem appears to be that the InitContext function does not actually exist despite being called by pretrain_gpt2.py. I have tried to introduce some changes to get it to run (incl. changing the batch size, the initialization function, and some of the inputs to the initialization function) but gave up after it threw the error variable beta1 is referenced before assignment. I think that has to do with something wonky in the optimizer?

Issue Analytics

  • State:open
  • Created 3 years ago
  • Comments:6 (1 by maintainers)

github_iconTop GitHub Comments

1reaction
StellaAthenacommented, Apr 28, 2021

Same issue. Is there a workaround to get Megatron with Zero3 running?

I was unable to ever get this implementation to run. However I do have ZeRO-3 implemented in my megatron variant. You can check out the ZeRO-3-Brrrrr branch of www.github.com/eleutherai/gpt-neox if you want to give it a try. Unfortunately it’s not currently compatible with the main branch but I hope to have that remedied this weekend or next week (it has nothing to do with ZeRO, just internal changes that need to be made for consistency with other commits).

1reaction
StellaAthenacommented, Mar 10, 2021

@ShadenSmith Unfortunately it still does not run. The current error is that beta1 is referenced before assignment. I can get a stack trace in a couple minutes.

Read more comments on GitHub >

github_iconTop Results From Across the Web

ZeRO — DeepSpeed 0.8.0 documentation - Read the Docs
Below are a few examples of ZeRO-3 configurations. Please see our config guide for a complete list of options for configuration and performance...
Read more >
DeepSpeed Integration - Hugging Face
DeepSpeed ZeRO-3 can be used for inference as well, since it allows huge models to be loaded on multiple GPUs, which won't be...
Read more >
How to fix Google Ads Video campaigns not running or low ...
Your campaign is receiving a low number of (or zero) impressions. What you'll learn. Sometimes, the issue may not be related to your...
Read more >
Why Dividing by Zero is Undefined - University of North Georgia
Our next example is going to be 1 divided by zero. And a lot of people like to ... So this does not...
Read more >
Zero Redundancy Optimizer - DeepSpeed
If you have not done so already, we advise that you read the DeepSpeed ... ZeRO-3 will automatically collect and partition them during...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found