VRAM use grows when using SentenceTransformer.encode + potential fix.
See original GitHub issueHey,
I have been using this repository to obtain sentence embeddings for a data set I am currently working on. When using SentenceTransformer.encode
, I noticed that my VRAM usage grows with time until a CUDA out of memory error is raised. Through my own experiments I have found the following:
- detaching the embeddings before they are extended to
all_embeddings
, using:embeddings = embeddings.to("cpu")
greatly reduces this growth. - Even with the above line added the VRAM use grew, albiet slowy, by adding the line
torch.cuda.empty_cache()
after the above VRAM usage appears to stop growing over time. The first point makes sense as a fix but I am unsure why this line is necessary?
I am using: pytorch 1.6.0, transformers 3.3.1, sentence_transformers 0.3.7.
Have I missed something in the docs or am doing something daft? I am happy to submit a pull request if needs be?
Thanks,
Martin
Issue Analytics
- State:
- Created 3 years ago
- Reactions:2
- Comments:21 (9 by maintainers)
Top Results From Across the Web
SentenceTransformer — Sentence-Transformers documentation
Loads or create a SentenceTransformer model, that can be used to map sentences / text to embeddings. ... batch_size – Encode sentences with...
Read more >Computing Sentence Embeddings
If it is not a path, it first tries to download a pre-trained SentenceTransformer model. If that fails, tries to construct a model...
Read more >Quickstart — Sentence-Transformers documentation
Once you have SentenceTransformers installed, the usage is simple: ... encoded by calling model.encode() emb1 = model.encode("This is a red cat with a...
Read more >SentenceTransformers Documentation — Sentence ...
You can use this framework to compute sentence / text embeddings for more than ... SentenceTransformer('all-MiniLM-L6-v2') #Our sentences we like to encode ......
Read more >Training Overview — Sentence-Transformers documentation
In the quick start & usage examples, we used pre-trained SentenceTransformer models that already come with a BERT layer and a pooling layer....
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Agree, when convert_to_numpy=True, I will change the code so that detach and cpu() happens in the loop, not afterwards.
I got same observation. Current code would still crash because of OOM. I managed to fix it by adding following lines for each iteration at https://github.com/UKPLab/sentence-transformers/blob/master/sentence_transformers/SentenceTransformer.py#L188: “”" del embeddings torch.cuda.empty_cache() “”"