Seq2seq now has larger memory requirements, OOM w/Deepspeed on previously runnable models
See original GitHub issue(A continuation of #10149 , since it looks like it’s a broader issue:)
It looks like seq2seq has changed in the past week, and now gives out-of-memory errors for @stas00 's impressive recent DeepSpeed work that allowed training/predicting e.g. T5-11B on a single 40GB card.
Here’s a simple repeatable example using the newer scripts:
Run script:
export OUTPUTDIR=tst-summarization
export BS=1; rm -rf $OUTPUTDIR; PYTHONPATH=../../src USE_TF=0 /usr/bin/time -v deepspeed --num_gpus=4 ./run_seq2seq.py \
--model_name_or_path allenai/unifiedqa-t5-11b \
--do_train \
--do_eval \
--do_predict \
--task summarization \
--dataset_name xsum \
--output_dir $OUTPUTDIR \
--per_device_train_batch_size=$BS \
--per_device_eval_batch_size=$BS \
--overwrite_output_dir \
--predict_with_generate \
--max_train_samples 500 \
--max_val_samples 100 \
--max_test_samples 100 \
(One note: Should I be adding a --deepspeed option as with the old finetune_trainer.py (I am not seeing it in the list of options)? And if so, should it be pointing to the new location for the config file ( …/tests/deepspeed/ds_config.json ), or does it use this location by default?)
Conda Environment:
# Make new environment
conda create --name transformers-feb12-2021 python=3.8
conda activate transformers-feb12-2021
# Clone transformers
git clone https://github.com/huggingface/transformers.git
cd transformers
# Install nightly build of Pytorch
pip install --pre torch torchvision -f https://download.pytorch.org/whl/nightly/cu110/torch_nightly.html -U
# Install seq2seq transformers requirements
pip install -r examples/seq2seq/requirements.txt
# Install transformers
pip install -e .
# Install DeepSpeed from source for the A100 support
cd ..
git clone https://github.com/microsoft/DeepSpeed.git
cd DeepSpeed/
# Checkout release for DeepSpeed 0.3.10 (to avoid AMD bug in latest)
git checkout c14b839d9
./install.sh
pip install .
Error:
...
RuntimeError: CUDA out of memory. Tried to allocate 256.00 MiB (GPU 2; 39.59 GiB total capacity; 37.87 GiB already allocated; 40.69 MiB free; 37.88 GiB reserved in total by PyTorch)
Traceback (most recent call last):
File "./run_seq2seq.py", line 629, in <module>
main()
File "./run_seq2seq.py", line 543, in main
trainer = Seq2SeqTrainer(
File "/home/pajansen/github/transformers-feb12-2021/transformers/src/transformers/trainer.py", line 276, in __init__
model = model.to(args.device)
File "/home/pajansen/anaconda3/envs/transformers-feb12-2021/lib/python3.8/site-packages/torch/nn/modules/module.py", line 673, in to
return self._apply(convert)
File "/home/pajansen/anaconda3/envs/transformers-feb12-2021/lib/python3.8/site-packages/torch/nn/modules/module.py", line 387, in _apply
module._apply(fn)
File "/home/pajansen/anaconda3/envs/transformers-feb12-2021/lib/python3.8/site-packages/torch/nn/modules/module.py", line 387, in _apply
module._apply(fn)
File "/home/pajansen/anaconda3/envs/transformers-feb12-2021/lib/python3.8/site-packages/torch/nn/modules/module.py", line 387, in _apply
module._apply(fn)
[Previous line repeated 4 more times]
File "/home/pajansen/anaconda3/envs/transformers-feb12-2021/lib/python3.8/site-packages/torch/nn/modules/module.py", line 409, in _apply
param_applied = fn(param)
File "/home/pajansen/anaconda3/envs/transformers-feb12-2021/lib/python3.8/site-packages/torch/nn/modules/module.py", line 671, in convert
return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
RuntimeError: CUDA out of memory. Tried to allocate 256.00 MiB (GPU 3; 39.59 GiB total capacity; 37.87 GiB already allocated; 40.69 MiB free; 37.88 GiB reserved in total by PyTorch)
Issue Analytics
- State:
- Created 3 years ago
- Reactions:1
- Comments:22 (14 by maintainers)
Top Results From Across the Web
What is causing the GPU out-of-memory error(OOM) for my ...
I'm using 2 GTX 1080 with 8GB RAM, and I'm training my code with GPU support. It's telling me that I ran out...
Read more >GPU memory usage on long seq2seq sequences
Hi! I'm trying to optimise memory requirements for seq2seq decoder when every input for decoder is taken from previous step's output ...
Read more >Attention for RNN Seq2Seq Models (1.25x speed ... - YouTube
Next Video: https://youtu.be/06r6kp7ujCA Attention was originally proposed by Bahdanau et al. in 2015. Later on, attention finds … Show more.
Read more >Reducing Memory Footprint and OOM Terminations in iOS
Each time iOS opens one of those apps, it might need to free memory to allow such application to run smoothly. At this...
Read more >Retrieval-Augmented Generation ... - Review for NeurIPS paper
Summary and Contributions: This paper propose a hybrid generation models by integrating the information retrieval strategy (non-parametric memory) with seq2seq ...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
another update: DS currently locks one in if one wants to be able to access the fp32 model, see https://github.com/microsoft/DeepSpeed/issues/797 once they add a method to extract the fp32 model https://github.com/microsoft/DeepSpeed/issues/800 then we can sort this out.
Thank you for the details, @PeterAJansen - hoping to validate later in the day, but meanwhile this PR should solve it https://github.com/huggingface/transformers/pull/10243 (i.e. instead of the patch I sent last night).
edit PR merged, so master should be OK.