BART on CNN/DM : how to train on small GPU ?
See original GitHub issueI’m trying to reproduce the CNN/DM results of BART.
Unfortunately, I don’t have access to good GPU. I only have access to 2 GPU with 8GB of memory.
I updated the finetuning cmd accordingly (changing UPDATE_FREQ
) for the number of GPU.
But I have issue for the memory of GPU : I tried reducing MAX_TOKENS
to 512
in order to make the data fit in my 8GB, but I receive following error :
AssertionError: sentence at index 227550 of size 728 exceeds max_tokens limit of 512!
If I set MAX_TOKENS
to 1024
, I have a CUDA out of memory
error (expected).
What modification do I need to do to be able to finetune the model on small GPU (8GB) ?
Issue Analytics
- State:
- Created 4 years ago
- Comments:21 (9 by maintainers)
Top Results From Across the Web
BART on CNN/DM : how to train on small GPU ? #1413 - GitHub
I'm trying to reproduce the CNN/DM results of BART. Unfortunately, I don't have access to good GPU. I only have access to 2...
Read more >facebook/bart-large-cnn - Hugging Face
BART is pre-trained by (1) corrupting text with an arbitrary noising function, and (2) learning a model to reconstruct the original text.
Read more >Fine-Tuning the BART Large Model for Text Summarization
It's 100% free and provides easy access to a GPU, which will speed training up. The code. 1. Import and prepare the data....
Read more >arXiv:2105.03801v2 [cs.CL] 29 May 2021
Subsequently, we train different configurations of BART/LoBART models up to our GPU memory limit of 32GiB. The results in Table 3 show that:....
Read more >How to scale the BERT Training with Nvidia GPUs? - Medium
In specific, we look into Nvidia's BERT implementation to see how the BERT training can be completed as short as 47 minutes.
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
@wonjininfo
On my side, I trained BART on 4 x 11GB GPU.
As mentioned earlier, 11GB is not enough to fit 1 sample (1024 tokens). So I used
--memory-efficient-fp16
. Even though my GPU does not support FP16 training, this reduced the required memory by almost half.But still, it was not enough, so I reduced the
MAX_TOKENS
from1024
to928
. With these parameters, I could fit 1 sample in my GPU.With
MAX_TOKENS = 928
and--memory-efficient-fp16
, I got following results :It’s a bit lower than normal BART, but it was expected due to my parameters.
I didn’t try training the model with lower number of
MAX_TOKENS
, as I could fit 1 sample already with928
.Merry christmas 😃
Note that
--memory-efficient-fp16
can produce worse results, especially with small batch sizes. You’re probably better off either decreasing the batch size and/or training in FP32, since FP16 can actually use more memory since it needs to maintain both an FP32 and FP16 copy of the model.