Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Seq2seq now has larger memory requirements, OOM w/Deepspeed on previously runnable models

See original GitHub issue

(A continuation of #10149 , since it looks like it’s a broader issue:)

It looks like seq2seq has changed in the past week, and now gives out-of-memory errors for @stas00 's impressive recent DeepSpeed work that allowed training/predicting e.g. T5-11B on a single 40GB card.

Here’s a simple repeatable example using the newer scripts:

Run script:

export OUTPUTDIR=tst-summarization
export BS=1; rm -rf $OUTPUTDIR; PYTHONPATH=../../src USE_TF=0 /usr/bin/time -v deepspeed --num_gpus=4 ./run_seq2seq.py \
    --model_name_or_path allenai/unifiedqa-t5-11b \
    --do_train \
    --do_eval \
    --do_predict \
    --task summarization \
    --dataset_name xsum \
    --output_dir $OUTPUTDIR \
    --per_device_train_batch_size=$BS \
    --per_device_eval_batch_size=$BS \
    --overwrite_output_dir \
    --predict_with_generate \
    --max_train_samples 500 \
    --max_val_samples 100 \
    --max_test_samples 100 \

(One note: Should I be adding a --deepspeed option as with the old finetune_trainer.py (I am not seeing it in the list of options)? And if so, should it be pointing to the new location for the config file ( …/tests/deepspeed/ds_config.json ), or does it use this location by default?)

Conda Environment:

# Make new environment
conda create --name transformers-feb12-2021 python=3.8
conda activate transformers-feb12-2021

# Clone transformers
git clone https://github.com/huggingface/transformers.git
cd transformers

# Install nightly build of Pytorch
pip install --pre torch torchvision -f https://download.pytorch.org/whl/nightly/cu110/torch_nightly.html -U

# Install seq2seq transformers requirements
pip install -r examples/seq2seq/requirements.txt

# Install transformers
pip install -e .

# Install DeepSpeed from source for the A100 support
cd ..
git clone https://github.com/microsoft/DeepSpeed.git
cd DeepSpeed/
# Checkout release for DeepSpeed 0.3.10 (to avoid AMD bug in latest)
git checkout c14b839d9
./install.sh
pip install .

Error:

...
RuntimeError: CUDA out of memory. Tried to allocate 256.00 MiB (GPU 2; 39.59 GiB total capacity; 37.87 GiB already allocated; 40.69 MiB free; 37.88 GiB reserved in total by PyTorch)
Traceback (most recent call last):
  File "./run_seq2seq.py", line 629, in <module>
    main()
  File "./run_seq2seq.py", line 543, in main
    trainer = Seq2SeqTrainer(
  File "/home/pajansen/github/transformers-feb12-2021/transformers/src/transformers/trainer.py", line 276, in __init__
    model = model.to(args.device)
  File "/home/pajansen/anaconda3/envs/transformers-feb12-2021/lib/python3.8/site-packages/torch/nn/modules/module.py", line 673, in to
    return self._apply(convert)
  File "/home/pajansen/anaconda3/envs/transformers-feb12-2021/lib/python3.8/site-packages/torch/nn/modules/module.py", line 387, in _apply
    module._apply(fn)
  File "/home/pajansen/anaconda3/envs/transformers-feb12-2021/lib/python3.8/site-packages/torch/nn/modules/module.py", line 387, in _apply
    module._apply(fn)
  File "/home/pajansen/anaconda3/envs/transformers-feb12-2021/lib/python3.8/site-packages/torch/nn/modules/module.py", line 387, in _apply
    module._apply(fn)
  [Previous line repeated 4 more times]
  File "/home/pajansen/anaconda3/envs/transformers-feb12-2021/lib/python3.8/site-packages/torch/nn/modules/module.py", line 409, in _apply
    param_applied = fn(param)
  File "/home/pajansen/anaconda3/envs/transformers-feb12-2021/lib/python3.8/site-packages/torch/nn/modules/module.py", line 671, in convert
    return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
RuntimeError: CUDA out of memory. Tried to allocate 256.00 MiB (GPU 3; 39.59 GiB total capacity; 37.87 GiB already allocated; 40.69 MiB free; 37.88 GiB reserved in total by PyTorch)

Issue Analytics

State:
Created 3 years ago
Reactions:1
Comments:22 (14 by maintainers)

Top GitHub Comments

2reactions

stas00commented, Feb 26, 2021

another update: DS currently locks one in if one wants to be able to access the fp32 model, see https://github.com/microsoft/DeepSpeed/issues/797 once they add a method to extract the fp32 model https://github.com/microsoft/DeepSpeed/issues/800 then we can sort this out.

2reactions

stas00commented, Feb 17, 2021

Thank you for the details, @PeterAJansen - hoping to validate later in the day, but meanwhile this PR should solve it https://github.com/huggingface/transformers/pull/10243 (i.e. instead of the patch I sent last night).

edit PR merged, so master should be OK.

Top Results From Across the Web

What is causing the GPU out-of-memory error(OOM) for my ...

I'm using 2 GTX 1080 with 8GB RAM, and I'm training my code with GPU support. It's telling me that I ran out...

GPU memory usage on long seq2seq sequences

Hi! I'm trying to optimise memory requirements for seq2seq decoder when every input for decoder is taken from previous step's output ...

Attention for RNN Seq2Seq Models (1.25x speed ... - YouTube

Next Video: https://youtu.be/06r6kp7ujCA Attention was originally proposed by Bahdanau et al. in 2015. Later on, attention finds … Show more.

Reducing Memory Footprint and OOM Terminations in iOS

Each time iOS opens one of those apps, it might need to free memory to allow such application to run smoothly. At this...

Retrieval-Augmented Generation ... - Review for NeurIPS paper

Summary and Contributions: This paper propose a hybrid generation models by integrating the information retrieval strategy (non-parametric memory) with seq2seq ...