question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

BART on CNN/DM : how to train on small GPU ?

See original GitHub issue

I’m trying to reproduce the CNN/DM results of BART.
Unfortunately, I don’t have access to good GPU. I only have access to 2 GPU with 8GB of memory.


I updated the finetuning cmd accordingly (changing UPDATE_FREQ) for the number of GPU.

But I have issue for the memory of GPU : I tried reducing MAX_TOKENS to 512 in order to make the data fit in my 8GB, but I receive following error :

AssertionError: sentence at index 227550 of size 728 exceeds max_tokens limit of 512!

If I set MAX_TOKENS to 1024, I have a CUDA out of memory error (expected).


What modification do I need to do to be able to finetune the model on small GPU (8GB) ?

@ngoyal2707 @yinhanliu

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:21 (9 by maintainers)

github_iconTop GitHub Comments

5reactions
astariulcommented, Dec 24, 2019

@wonjininfo

On my side, I trained BART on 4 x 11GB GPU.
As mentioned earlier, 11GB is not enough to fit 1 sample (1024 tokens). So I used --memory-efficient-fp16. Even though my GPU does not support FP16 training, this reduced the required memory by almost half.

But still, it was not enough, so I reduced the MAX_TOKENS from 1024 to 928. With these parameters, I could fit 1 sample in my GPU.

With MAX_TOKENS = 928 and --memory-efficient-fp16, I got following results :

R1 = 43.61 R2 = 20.90 RL = 40.41

It’s a bit lower than normal BART, but it was expected due to my parameters.


I didn’t try training the model with lower number of MAX_TOKENS, as I could fit 1 sample already with 928.

Merry christmas 😃

4reactions
myleottcommented, Nov 25, 2019

Note that --memory-efficient-fp16 can produce worse results, especially with small batch sizes. You’re probably better off either decreasing the batch size and/or training in FP32, since FP16 can actually use more memory since it needs to maintain both an FP32 and FP16 copy of the model.

Read more comments on GitHub >

github_iconTop Results From Across the Web

BART on CNN/DM : how to train on small GPU ? #1413 - GitHub
I'm trying to reproduce the CNN/DM results of BART. Unfortunately, I don't have access to good GPU. I only have access to 2...
Read more >
facebook/bart-large-cnn - Hugging Face
BART is pre-trained by (1) corrupting text with an arbitrary noising function, and (2) learning a model to reconstruct the original text.
Read more >
Fine-Tuning the BART Large Model for Text Summarization
It's 100% free and provides easy access to a GPU, which will speed training up. The code. 1. Import and prepare the data....
Read more >
arXiv:2105.03801v2 [cs.CL] 29 May 2021
Subsequently, we train different configurations of BART/LoBART models up to our GPU memory limit of 32GiB. The results in Table 3 show that:....
Read more >
How to scale the BERT Training with Nvidia GPUs? - Medium
In specific, we look into Nvidia's BERT implementation to see how the BERT training can be completed as short as 47 minutes.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found