[wav2vec] deepspeed eval bug in the case of >1 gpus
See original GitHub issueEnvironment info
transformers
version: 4.5.1- Platform: Linux-4.15.0-140-generic-x86_64-with-debian-buster-sid
- Python version: 3.7.9
- PyTorch version (GPU?): 1.8.1 (True)
- Tensorflow version (GPU?): not installed (NA)
- Using GPU in script?: <2,4>
- Using distributed or parallel set-up in script?: <distributed>
Who can help
@stas00 @patrickvonplaten @patil-suraj
Information
I’m working on wav2vec2.0 using the following official script of huggingface. https://github.com/huggingface/transformers/blob/master/examples/research_projects/wav2vec2/run_common_voice.py
I am trying to finetune huggingface model with multiple gpus using deepspeed.
deepspeed --num_gpus=1 run_common_voice.py --deepspeed ds_config.json --do_train --do_eval
works, but
deepspeed --num_gpus=2 run_common_voice.py --deepspeed ds_config.json --do_train --do_eval
stops working and freezes at the end of eval. The progress bar is 100% done but the eval result is not returned and it freezes.
To reproduce
This is how to reproduce! https://colab.research.google.com/drive/1VRCGcnhBlrMFYQ5aaNebucZuja-WB2I2?usp=sharing Steps to reproduce the behavior:
- Install deepspeed
- Add
with autocast():
after line 481 in run_common_voice.py - Set param:
--deepspeed ds_config.json --do_train --do_eval
- Run run_common_voice.py using deepspeed with 1> gpus
ds_config has the following parameters.
{
"fp16": {
"enabled": "true",
"loss_scale": 0,
"loss_scale_window": 1000,
"hysteresis": 2,
"min_loss_scale": 1,
"opt_level": "O3"
},
"steps_per_print": 100,
"wall_clock_breakdown": "false"
}
Expected behavior
The finetuning eval should be executed without freezing.
Issue Analytics
- State:
- Created 2 years ago
- Reactions:1
- Comments:10 (9 by maintainers)
Top Results From Across the Web
Eval freezes on local multi GPU Deepspeed run
I am trying to finetune huggingface model with multiple gpus using deepspeed. deepspeed --num_gpus=1 run_common_voice.py --deepspeed ds_config.
Read more >Accelerate GPT-J inference with DeepSpeed-Inference on GPUs
Learn how to optimize GPT-J for GPU inference with a 1-line of code using Hugging Face Transformers and DeepSpeed.
Read more >v4.5.0: BigBird, GPT Neo, Examples, Flax support | Zenodo
... Fix big bird gpu test #10967 (@patrickvonplaten) [Notebook] add ... Fixed finename for Saving null_odds in the evaluation stage in QA ...
Read more >Running out of memory with pytorch - Stack Overflow
From your existing model you might tell which layer sits on which gpu with .to('cuda:0') , .to('cuda:1') etc. class ModelParallelResNet50(ResNet): ...
Read more >Aman's AI Journal • Papers List
In the case where G and D are defined by multilayer perceptrons, the entire ... boxes and class probabilities directly from full images...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
You’re welcome to follow my progress at fixing this issue at https://github.com/huggingface/transformers/pull/11638
ZeRO-2 works fully. ZeRO-3 still has one issue, but fp32 works.
Do try and let me know if you run into any problems.
OK, this is a new type of model that requires a special type of handling.
The NLP models get
long
inputs which get converted to the same dtype as the embedding weights, which under deepspeed/fp16 arefloat16
. Currently deepspeed doesmodel.half
.This model however receives inputs that are
float32
and it doesn’t check whether the model weights are fp16 or not. Hence the error.So this is one way to fix it:
The test I was using is:
Could probably move it to the top-level layer so it’d work in all cases, if this exact path isn’t always taken.
So this overcomes:
but now running into:
so need to look more to see what to do there, probably need to switch to float32 just for that op.
However, it appears that may be this model can’t be trained/eval’ed in fp16/mixed precision?
When I run:
I see:
We have multiple models that won’t train under
fp16
-mixed precision, because they were pretrained inbfloat16
which doesn’t lend tofp16
numerical range.Deepspeed devs are working on adding the fp32 mode (next release hopefully). https://github.com/microsoft/DeepSpeed/pull/1004
p.s. please don’t mix
amp
with running modes that don’t useamp
(deepspeed is one of them)