ZeRO 3 example does not run
See original GitHub issueThe ZeRO 3 example does not run. The main problem appears to be that the InitContext
function does not actually exist despite being called by pretrain_gpt2.py
. I have tried to introduce some changes to get it to run (incl. changing the batch size, the initialization function, and some of the inputs to the initialization function) but gave up after it threw the error variable beta1 is referenced before assignment
. I think that has to do with something wonky in the optimizer?
Issue Analytics
- State:
- Created 3 years ago
- Comments:6 (1 by maintainers)
Top Results From Across the Web
ZeRO — DeepSpeed 0.8.0 documentation - Read the Docs
Below are a few examples of ZeRO-3 configurations. Please see our config guide for a complete list of options for configuration and performance...
Read more >DeepSpeed Integration - Hugging Face
DeepSpeed ZeRO-3 can be used for inference as well, since it allows huge models to be loaded on multiple GPUs, which won't be...
Read more >How to fix Google Ads Video campaigns not running or low ...
Your campaign is receiving a low number of (or zero) impressions. What you'll learn. Sometimes, the issue may not be related to your...
Read more >Why Dividing by Zero is Undefined - University of North Georgia
Our next example is going to be 1 divided by zero. And a lot of people like to ... So this does not...
Read more >Zero Redundancy Optimizer - DeepSpeed
If you have not done so already, we advise that you read the DeepSpeed ... ZeRO-3 will automatically collect and partition them during...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
I was unable to ever get this implementation to run. However I do have ZeRO-3 implemented in my megatron variant. You can check out the
ZeRO-3-Brrrrr
branch of www.github.com/eleutherai/gpt-neox if you want to give it a try. Unfortunately it’s not currently compatible with themain
branch but I hope to have that remedied this weekend or next week (it has nothing to do with ZeRO, just internal changes that need to be made for consistency with other commits).@ShadenSmith Unfortunately it still does not run. The current error is that
beta1
is referenced before assignment. I can get a stack trace in a couple minutes.