Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Memory keeps increasing as sending requests - CPU Inference

See original GitHub issue

Context

I deployed the Resnet-18 eager mode model from the examples on local linux CPU machine. By monitoring the memory usage, I found that it is increasing as sending requests. And the memory usage of both frontend and service worker never get released even if no request incomes for a while.

torchserve version: 0.3.1
torch-model-archiver version: 0.2.0
torch version: 1.8.1+cpu
torchvision version [if any]: 0.9.1+cpu
java version: 11.0.14
Operating System and version: Debian GNU/Linux Version 10

Your Environment

Installed using source? [yes/no]: yes
Are you planning to deploy it using docker container? [yes/no]: yes
Is it a CPU or GPU environment?: CPU
Using a default/custom handler? [If possible upload/share custom handler/model]: Default
What kind of model is it e.g. vision, text, audio?: Vision
Are you planning to use local models from model-store or public url being used e.g. from S3 bucket etc.? [If public url then provide link.]: Resnet-18
Provide config.properties, logs [ts.log] and parameters used for model registration/update APIs:

default_workers_per_model=1 enable_envvars_config=true vmargs=-Xmx256m

Load Testing and Memory Usage Monitoring

I did the load testing in this way: As serving the resnet example on one worker by specifying default_workers_per_model=1 in config.properties, there are two related processes can be found by ps aux frontend and service worker. Then I monitor these two’s memory usage as continuously sending request in a for loop (as my understanding it is not concurrent). The frontend memory is increasing and never shows a sign to decline.

While I did a search on raised issues, after setting LRU and vmargs as suggested in other relevant issues. The memory usage seems to get capped by vmargs setting, but still increases very slowly when doing long term load testing shown at the second graph below. The worker memory shows a release manner. As exactly same as the example, the input image is always same and fixed size.

Expected Behavior vs Current Behavior

There are two things I may get confused about:

The frontend memory usage increases with incoming requests.
In idle time, the memory usage won’t decrease or flush out.

Possible Solution

For LRU, I set it in handler initialise(), os.environ["LRU_CACHE_CAPACITY"] = "1" For vmargs, I set it in config.properties, vmargs=-Xmx256m

Issue Analytics

State:
Created a year ago
Comments:10

Top GitHub Comments

1reaction

lxningcommented, Apr 14, 2022

@yangyang-nus-lv Here is the reply for your questions

“the total allocated memory actually gets doubled” In visualVM, “total allocated memory” shows how much memory was allocated during your sampling. Basically, this number is keep increasing as long as the sampling is running.
“the number of KQueue goes up” the number of KQueueEventLoopGroup threads is configured by “number_of_netty_threads” and “netty_client_threads” (see config ). In my test, both of them use default value (ie. #core=16). With the incoming message rate, the number of live KQueueEventLoop is dynamic.
“And the heap curve during idle time is increasing linearly” This is triggered by RMI TCP Connection thread which is used to feed visualVM (ie. my profiling tool) with data from TorchServe JVM. It is not continuously linearly increasing. It recovers every a few minutes (see the tailer part in figure1).

0reactions

yangyang-nus-lvcommented, Apr 12, 2022

@lxning thank you for the investigation. I may still have some questions, could you help me understand about the graphs? First, as we can see from the three memory allocation break down graph, the total allocated memory actually gets doubled from 1st time run to 2nd time. Also the number of KQueue goes up from 4 (both the concurrency and workers are set to be 4) to 8. So what is the total allocation here, is it a temporary value at some time frame? It seems not shown on the heap graph, as the last total memory is around 140Mb while the maximum usage on heap is sub 80Mb. And the heap curve during idle time is increasing linearly, what is reason explains that? Thanks in advance.