BART.generate: possible to reduce time/memory?
See original GitHub issue🐛 Performance issues
I did a quick benchmark between HuggingFace’s implementation of BART and FairSeq’s implementation.
You can find the benchmark code here.
Here is my results, on a single GPU GTX 1080 (12 GiB of memory) :
FP16 - Batch size 16 | s/batch | s/sample |
---|---|---|
FairSeq | 8.8676 | 0.5664 |
HuggingFace | 12.3358 | 0.7879 |
FP16 - Batch size 32 | s/batch | s/sample |
---|---|---|
FairSeq | 17.1247 | 0.5469 |
HuggingFace | OOM | OOM |
FP16 - Batch size 1 | s/sample |
---|---|
FairSeq | 1.6743 |
HuggingFace | 1.8856 |
FP32 - Batch size 1 | s/sample |
---|---|
FairSeq | 1.7865 |
HuggingFace | 2.0670 |
FairSeq is consistently faster than HuggingFace on all my experiments.
This sparks a few questions :
- Do you have similar results on your side ? Did I mess my benchmark ?
- Why HuggingFace’s implementation is significantly slower ?
- Why HuggingFace’s implementation takes more space in memory (illustrated by
OOM
with batch size of 32) ? - Is the release of the
Summarization Pipeline
going to improve this ?
Issue Analytics
- State:
- Created 4 years ago
- Comments:5 (4 by maintainers)
Top Results From Across the Web
Bart — transformers 2.11.0 documentation - Hugging Face
BartForConditionalGeneration.generate should be used for conditional generation tasks like summarization, see the example in that docstrings.
Read more >Rigorous Bounds on Cryptanalytic Time/Memory Tradeo s - The ...
pre-image of f(x) is found by trying all the possible pre-images x ... Finally we show a similar lower bound for time/memory/data tradeo...
Read more >Program optimization - Wikipedia
In computer science, program optimization, code optimization, or software optimization, is the process of modifying a software system to make some aspect of...
Read more >Proceedings of the Third AES Candidate Conference
B eing able to pipeline sub k ey generation at the same rate as encryption allows sub k eys to be generated concurrent...
Read more >Understanding Cryptography by Christof Paar
Bart Preneel's willingness to provide the Foreword is a great honor for us ... the encryption algorithm secret should make the whole system...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
On master, the gap has closed considerably! <16GB GPU RAM for fp16, bs=32, and timings much closer:
My numbers are a bit lower than yours because I am on an NVIDIA RTX GPU.
For both memory and speed, they have a lot of clever tricks that we haven’t implemented yet.