Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

🐛 [BART] Pipeline OOM

See original GitHub issue

🐛 Bug

I try to run BART model myself versus running the model through pipeline.

Running the BART model myself is fine, but I have OOM on my GPU if I try to run the same model through pipeline.

Please see the following code : https://gist.github.com/Colanim/4fae6ab52c05716062a0f20c4a6b9737

(It assume you have a file cnndm/test.source with an article on each line)

Run with : python pipeline_oom.py --model HuggingFace --batch-size 32
(Should not produce OOM on 11G-GPU)

and python pipeline_oom.py --model Pipeline --batch-size 32
(Should produce OOM on 11G-GPU)

Why the pipeline use more memory ?

@sshleifer

Issue Analytics

State:
Created 3 years ago
Comments:5 (4 by maintainers)

Top GitHub Comments

1reaction

sshleifercommented, Jun 18, 2020

OK I figured out the problem Long articles are not getting truncated anymore by pipeline. Will have a look. If you look at the second val.source example it’s 1583 tokens, and pipeline does not truncated it, whereas Huggingface does.

Related: #4236

1reaction

sshleifercommented, Jun 18, 2020

Yes I can replicate, sorry for the slow response. I am still trying to figure out why this is happening.