Production - Memory Leak?
See original GitHub issueIs anyone serving a Classifier in production? I’m using flask and upon every call to predict I am experiencing an increase in memory usage by the process until OOM error in the pod on Kubernetes, I only invoke this class once and use the same instance. This even happens for batch size 1
class Transformer:
def __init__(
self,
dir = 'model/',
):
self.model = MultiLabelClassificationModel(
'roberta', dir, use_cuda=False, args={'no_cache': True, 'use_cached_eval_features':False},
)
self.mlb=MultiLabelBinarizer()
self.mlb.fit([['A', 'B', 'C', 'D']])
def predict(self, batch,):
predictions, raw_outputs = self.model.predict(batch)
labels = self.mlb.inverse_transform(np.asarray(predictions))
return labels
Issue Analytics
- State:
- Created 3 years ago
- Comments:8 (4 by maintainers)
Top Results From Across the Web
Understanding Memory Leaks in Java - Baeldung
A Memory Leak is a situation where there are objects present in the heap that are no longer used, but the garbage collector...
Read more >Java Memory Leaks: Solutions, Tools, Tutorials & More - Stackify
We put together this guide to help you understand how, why, and where Java memory leaks happen – and what you can do...
Read more >Hunting Java Memory Leaks - Toptal
A memory leak occurs when object references that are no longer needed are unnecessarily maintained. These leaks are bad. For one, they put...
Read more >How to Detect Memory Leaks in Java: Causes, Types, & Tools
A memory leak is a situation where unused objects occupy unnecessary space in memory. Unused objects are typically removed by the Java ...
Read more >Memory Leak detection in production code - Stack Overflow
Note: Taking heap dump just freezes the application server and sometimes application may crash. And suppose you have allocated 12GB to jvm on ......
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
I have more info. The problem is with multiprocessing. When I run flask it works perfectly but when I run with gunicorn I have to deactivate ‘use_multiprocessing’ to work well.
Change this (this works with flask server but not with gunicorn): model = ClassificationModel(‘bert’, ‘outputs’, num_labels=11, use_cuda=False, args={‘silent’: True})
To this (this works for both): model = ClassificationModel(‘bert’, ‘outputs’, num_labels=11, use_cuda=False, args={‘silent’: True, ‘use_multiprocessing’: False})
I am running flask without Gunicorn, and it runs fine for me, I know it gives the warning that it isn’t production grade, but I’m sure I’ve processed more that 10k+ requests at this stage while live for at least a month.
And just an update on my memory issues from OP: for batch sizes 1, my instances converged to around 2.9GB. And for batch sizes 300, it did also converge to around 5-6GB so it was ok I guess. My problems were that my initial 2GB instances were too weak.