T5 generate with do_sample doesn't work on DeepSpeed Stage 3
See original GitHub issueSystem Info
transformers == 4.20.1 python == 3.8.13 OS == ubuntu 20.4 DeepSpeed == 0.6.7
Who can help?
Information
- The official example scripts
- My own modified scripts
Tasks
- An officially supported task in the
examples
folder (such as GLUE/SQuAD, …) - My own task or dataset (give details below)
Reproduction
https://github.com/microsoft/DeepSpeed/issues/2022#issuecomment-1158389764
Expected behavior
All processes run and finish.
Issue Analytics
- State:
- Created a year ago
- Comments:5 (5 by maintainers)
Top Results From Across the Web
DeepSpeed Integration - Hugging Face
DeepSpeed ZeRO Inference supports ZeRO stage 3 with ZeRO-Infinity. It uses the same ZeRO protocol as training, but it doesn't use an optimizer...
Read more >[Deepspeed ZeRO-3] Broken model save on fresh ... - GitHub
The problem with DeepSpeed is that it doesn't currently have a way to save a ... So, to load stage-3 checkpoint I should...
Read more >Enabling Efficient Inference of Transformer Models at ... - arXiv
high inference throughput with large models which do not fit in aggregate GPU memory. DeepSpeed Inference reduces latency by up to 7.3× over....
Read more >DeepSpeed - Release 0.7.6 Microsoft
Note: this approach may not work if your application doesn't have sufficient free CPU memory and you may need to use the offline...
Read more >DeepSpeed: Accelerating large-scale model inference and ...
Last month, the DeepSpeed Team announced ZeRO-Infinity, a step forward in ... Figure 3: Inference latency for the open-source models with ...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
oops, apologies for the typo - fixed! glad you figured it out, @lkm2835
This has nothing to do with t5 specifically, but just how ZeRO stage3 works. It needs to have all gpus work in sync. So if one gpu finished generating, it has to continue running
forward
because ZeRO distributes all the weight shards to all gpus and if one stops the other gpus can’t get the shards they are missing.So it really depends on the situations - sometimes all gpus generate the same output length and then it works w/o syncing, but that’s just an accident and can easily break down the road.
For more details please see: https://huggingface.co/docs/transformers/main/perf_train_gpu_many#zero-data-parallelism
@lkm2835, looking at your code you linked to, you must use
generate(..., synced_gpus=True)
when using ZeRO stage-3