question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

BART.generate: possible to reduce time/memory?

See original GitHub issue

🐛 Performance issues

I did a quick benchmark between HuggingFace’s implementation of BART and FairSeq’s implementation.

You can find the benchmark code here.


Here is my results, on a single GPU GTX 1080 (12 GiB of memory) :

FP16 - Batch size 16 s/batch s/sample
FairSeq 8.8676 0.5664
HuggingFace 12.3358 0.7879
FP16 - Batch size 32 s/batch s/sample
FairSeq 17.1247 0.5469
HuggingFace OOM OOM
FP16 - Batch size 1 s/sample
FairSeq 1.6743
HuggingFace 1.8856
FP32 - Batch size 1 s/sample
FairSeq 1.7865
HuggingFace 2.0670

FairSeq is consistently faster than HuggingFace on all my experiments.


This sparks a few questions :

  • Do you have similar results on your side ? Did I mess my benchmark ?
  • Why HuggingFace’s implementation is significantly slower ?
  • Why HuggingFace’s implementation takes more space in memory (illustrated by OOM with batch size of 32) ?
  • Is the release of the Summarization Pipeline going to improve this ?

@sshleifer

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:5 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
sshleifercommented, Mar 30, 2020

On master, the gap has closed considerably! <16GB GPU RAM for fp16, bs=32, and timings much closer: image

My numbers are a bit lower than yours because I am on an NVIDIA RTX GPU.

1reaction
sshleifercommented, Mar 6, 2020
  1. Identical to my benchmark for speed. Hadn’t tested memory but I’m not surprised that their implementation is less.

For both memory and speed, they have a lot of clever tricks that we haven’t implemented yet.

  1. Summarization Pipeline will not help, but I will take a longer look at this tomorrow and see if we can improve.
Read more comments on GitHub >

github_iconTop Results From Across the Web

Bart — transformers 2.11.0 documentation - Hugging Face
BartForConditionalGeneration.generate should be used for conditional generation tasks like summarization, see the example in that docstrings.
Read more >
Rigorous Bounds on Cryptanalytic Time/Memory Tradeo s - The ...
pre-image of f(x) is found by trying all the possible pre-images x ... Finally we show a similar lower bound for time/memory/data tradeo...
Read more >
Program optimization - Wikipedia
In computer science, program optimization, code optimization, or software optimization, is the process of modifying a software system to make some aspect of...
Read more >
Proceedings of the Third AES Candidate Conference
B eing able to pipeline sub k ey generation at the same rate as encryption allows sub k eys to be generated concurrent...
Read more >
Understanding Cryptography by Christof Paar
Bart Preneel's willingness to provide the Foreword is a great honor for us ... the encryption algorithm secret should make the whole system...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found