question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Memory keeps increasing as sending requests - CPU Inference

See original GitHub issue

Context

I deployed the Resnet-18 eager mode model from the examples on local linux CPU machine. By monitoring the memory usage, I found that it is increasing as sending requests. And the memory usage of both frontend and service worker never get released even if no request incomes for a while.

  • torchserve version: 0.3.1
  • torch-model-archiver version: 0.2.0
  • torch version: 1.8.1+cpu
  • torchvision version [if any]: 0.9.1+cpu
  • java version: 11.0.14
  • Operating System and version: Debian GNU/Linux Version 10

Your Environment

  • Installed using source? [yes/no]: yes

  • Are you planning to deploy it using docker container? [yes/no]: yes

  • Is it a CPU or GPU environment?: CPU

  • Using a default/custom handler? [If possible upload/share custom handler/model]: Default

  • What kind of model is it e.g. vision, text, audio?: Vision

  • Are you planning to use local models from model-store or public url being used e.g. from S3 bucket etc.? [If public url then provide link.]: Resnet-18

  • Provide config.properties, logs [ts.log] and parameters used for model registration/update APIs:

default_workers_per_model=1 enable_envvars_config=true vmargs=-Xmx256m

Load Testing and Memory Usage Monitoring

I did the load testing in this way: As serving the resnet example on one worker by specifying default_workers_per_model=1 in config.properties, there are two related processes can be found by ps aux frontend and service worker. Then I monitor these two’s memory usage as continuously sending request in a for loop (as my understanding it is not concurrent). The frontend memory is increasing and never shows a sign to decline. image

While I did a search on raised issues, after setting LRU and vmargs as suggested in other relevant issues. The memory usage seems to get capped by vmargs setting, but still increases very slowly when doing long term load testing shown at the second graph below. The worker memory shows a release manner. As exactly same as the example, the input image is always same and fixed size. image image

Expected Behavior vs Current Behavior

There are two things I may get confused about:

  1. The frontend memory usage increases with incoming requests.
  2. In idle time, the memory usage won’t decrease or flush out.

Possible Solution

For LRU, I set it in handler initialise(), os.environ["LRU_CACHE_CAPACITY"] = "1" For vmargs, I set it in config.properties, vmargs=-Xmx256m

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:10

github_iconTop GitHub Comments

1reaction
lxningcommented, Apr 14, 2022

@yangyang-nus-lv Here is the reply for your questions

  1. “the total allocated memory actually gets doubled” In visualVM, “total allocated memory” shows how much memory was allocated during your sampling. Basically, this number is keep increasing as long as the sampling is running.

  2. “the number of KQueue goes up” the number of KQueueEventLoopGroup threads is configured by “number_of_netty_threads” and “netty_client_threads” (see config ). In my test, both of them use default value (ie. #core=16). With the incoming message rate, the number of live KQueueEventLoop is dynamic.

  3. “And the heap curve during idle time is increasing linearly” This is triggered by RMI TCP Connection thread which is used to feed visualVM (ie. my profiling tool) with data from TorchServe JVM. It is not continuously linearly increasing. It recovers every a few minutes (see the tailer part in figure1).

0reactions
yangyang-nus-lvcommented, Apr 12, 2022

@lxning thank you for the investigation. I may still have some questions, could you help me understand about the graphs? First, as we can see from the three memory allocation break down graph, the total allocated memory actually gets doubled from 1st time run to 2nd time. Also the number of KQueue goes up from 4 (both the concurrency and workers are set to be 4) to 8. So what is the total allocation here, is it a temporary value at some time frame? It seems not shown on the heap graph, as the last total memory is around 140Mb while the maximum usage on heap is sub 80Mb. And the heap curve during idle time is increasing linearly, what is reason explains that? Thanks in advance.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Host multiple models in one container behind one endpoint
For GPU backed instances, a higher amount of instance and GPU memory enables you to have more models loaded and ready to serve...
Read more >
Demand Layering for Real-Time DNN Inference with ... - arXiv
Abstract—When executing a deep neural network (DNN), its model parameters are loaded into GPU memory before execution,.
Read more >
Memory management and patterns in ASP.NET Core
Allocated memory slowly increases until a GC occurs. Memory increases because the tool allocates custom object to capture data.
Read more >
Performance Guide | TFX - TensorFlow
Life of a TensorFlow Serving inference request ... larger (more CPU and RAM) machines (i.e. a Deployment with a lower replicas in Kubernetes ......
Read more >
CPU Inference Performance Boost with “Throughput” Mode in ...
In addition to the number of inference requests, it is also possible to play ... Intel® Core™ i7-8700 Processor @ 3.20GHz with 16...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found