a memory leak in evaluation
See original GitHub issueEnvironment info
transformers
version: 4.4.0.dev0- Platform: Linux-3.10.0-957.el7.x86_64-x86_64-with-glibc2.10
- Python version: 3.8.8
- PyTorch version (GPU?): 1.7.1+cu101 (True)
- Tensorflow version (GPU?): not installed (NA)
- Using GPU in script?: True
- Using distributed or parallel set-up in script?: False
Who can help
Information
Model I am using (Bert, XLNet …): albert-base-v2 but use a hidden_size of 2048 and a num_attention_heads of 16, distilled from albert-xlarge-v2.
The problem arises when using:
- the official example scripts: (give details below) examples/text-classification/run_glue.py
The tasks I am working on is:
- an official GLUE/SQUaD task: (give the name) GLUE QQP task
To reproduce
Steps to reproduce the behavior:
I want to evaluate my model on the GLUE QQP task. If I don’t use eval_accumulation_step, my GPUs are OOM. But if I use eval_accumulation_step and my memory usage will grow up to the memory limit (>250GB) until the first process is killed due to this issue. So I assumed that maybe there is a memory leak. My running script is as below.
CUDA_VISIBLE_DEVICES=0 ~/.conda/envs/thesis-lyh/bin/python run_glue.py \
--model_name_or_path $MODEL_PATH \
--task_name $TASK_NAME \
--eval_accumulation_step 1 \
--do_eval \
--max_seq_length 128 \
--per_device_eval_batch_size 1 \
--output_dir output/glue/$TASK_NAME/$MODEL_NAME/
No matter what batch_size and accumulation_step are set to, the above problem still occurs. But I am doing fine in models hosted in the model hub and a smaller model I distilled in the same way.
Expected behavior
I have 250GB RAM so it should be enough to save the result.
Issue Analytics
- State:
- Created 2 years ago
- Comments:7 (3 by maintainers)
Top GitHub Comments
Thank you for your reply. I double-check my config today and find that I am reusing the config from distillation and the output_hidden_states is set to true…I am very sorry for my carelessness and thank you so much for your time and attention.
Ah I understand better now 😃