Memory keeps increasing as sending requests - CPU Inference
See original GitHub issueContext
I deployed the Resnet-18 eager mode model from the examples on local linux CPU machine. By monitoring the memory usage, I found that it is increasing as sending requests. And the memory usage of both frontend and service worker never get released even if no request incomes for a while.
- torchserve version: 0.3.1
- torch-model-archiver version: 0.2.0
- torch version: 1.8.1+cpu
- torchvision version [if any]: 0.9.1+cpu
- java version: 11.0.14
- Operating System and version: Debian GNU/Linux Version 10
Your Environment
-
Installed using source? [yes/no]: yes
-
Are you planning to deploy it using docker container? [yes/no]: yes
-
Is it a CPU or GPU environment?: CPU
-
Using a default/custom handler? [If possible upload/share custom handler/model]: Default
-
What kind of model is it e.g. vision, text, audio?: Vision
-
Are you planning to use local models from model-store or public url being used e.g. from S3 bucket etc.? [If public url then provide link.]: Resnet-18
-
Provide config.properties, logs [ts.log] and parameters used for model registration/update APIs:
default_workers_per_model=1 enable_envvars_config=true vmargs=-Xmx256m
Load Testing and Memory Usage Monitoring
I did the load testing in this way: As serving the resnet example on one worker by specifying default_workers_per_model=1
in config.properties
, there are two related processes can be found by ps aux
frontend and service worker. Then I monitor these two’s memory usage as continuously sending request in a for loop (as my understanding it is not concurrent). The frontend memory is increasing and never shows a sign to decline.
While I did a search on raised issues, after setting LRU and vmargs as suggested in other relevant issues. The memory usage seems to get capped by vmargs setting, but still increases very slowly when doing long term load testing shown at the second graph below. The worker memory shows a release manner. As exactly same as the example, the input image is always same and fixed size.
Expected Behavior vs Current Behavior
There are two things I may get confused about:
- The frontend memory usage increases with incoming requests.
- In idle time, the memory usage won’t decrease or flush out.
Possible Solution
For LRU, I set it in handler initialise(), os.environ["LRU_CACHE_CAPACITY"] = "1"
For vmargs, I set it in config.properties
, vmargs=-Xmx256m
Issue Analytics
- State:
- Created a year ago
- Comments:10
Top GitHub Comments
@yangyang-nus-lv Here is the reply for your questions
“the total allocated memory actually gets doubled” In visualVM, “total allocated memory” shows how much memory was allocated during your sampling. Basically, this number is keep increasing as long as the sampling is running.
“the number of KQueue goes up” the number of KQueueEventLoopGroup threads is configured by “number_of_netty_threads” and “netty_client_threads” (see config ). In my test, both of them use default value (ie. #core=16). With the incoming message rate, the number of live KQueueEventLoop is dynamic.
“And the heap curve during idle time is increasing linearly” This is triggered by RMI TCP Connection thread which is used to feed visualVM (ie. my profiling tool) with data from TorchServe JVM. It is not continuously linearly increasing. It recovers every a few minutes (see the tailer part in figure1).
@lxning thank you for the investigation. I may still have some questions, could you help me understand about the graphs? First, as we can see from the three memory allocation break down graph, the total allocated memory actually gets doubled from 1st time run to 2nd time. Also the number of KQueue goes up from 4 (both the concurrency and workers are set to be 4) to 8. So what is the total allocation here, is it a temporary value at some time frame? It seems not shown on the heap graph, as the last total memory is around 140Mb while the maximum usage on heap is sub 80Mb. And the heap curve during idle time is increasing linearly, what is reason explains that? Thanks in advance.