question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

agent container: high load and memory issue with many addresses

See original GitHub issue

After having added 6000 addresses, the agent container constantly shows a high load and growing memory usage.

Test setup: Test was done on minikube using EnMasse master (last commit from 2018-03-16). Extra memoryUsage log output (with preceding gc() invocation) was added to the agent (10s interval). Using a script, 3000 pooled-queue and 3000 standard-anycast addresses have been created by directly adding the corresponding K8s ConfigMaps. In order not to create too many router/broker instances, the credit settings in the plan.yaml have been reduced so that there are 2 brokers and 4 routers in the end for the created addresses. The admin container was configured with a memory setting of “2Gi”, readiness/liveness check timeout and failureThreshold have been increased.

Outcome: Load/CPU usage of agent container after all addresses are active:

top - 19:16:14 up  1:04,  0 users,  load average: 11.22, 10.08, 7.89
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
    1 root      20   0 1990632 1.044g  17880 R  86.1 10.7  13:20.65 node
top - 19:16:15 up  1:04,  0 users,  load average: 11.28, 10.11, 7.91
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
    1 root      20   0 2004264 1.057g  17880 R  60.0 10.8  13:21.25 node
top - 19:16:16 up  1:04,  0 users,  load average: 11.28, 10.11, 7.91
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
    1 root      20   0 2028936 1.080g  17880 R  79.2 11.1  13:22.05 node
top - 19:16:17 up  1:04,  0 users,  load average: 11.28, 10.11, 7.91
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
    1 root      20   0 2039496 1.090g  17880 R  76.0 11.2  13:22.81 node
top - 19:16:18 up  1:04,  0 users,  load average: 11.28, 10.11, 7.91
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
    1 root      20   0 2050056 1.101g  17880 R  88.0 11.3  13:23.69 node
top - 19:16:19 up  1:04,  0 users,  load average: 11.28, 10.11, 7.91
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
    1 root      20   0 2063576 1.113g  17880 R  98.0 11.4  13:24.67 node
top - 19:16:20 up  1:04,  0 users,  load average: 10.46, 9.96, 7.88
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
    1 root      20   0 2076248 1.125g  17880 R  93.1 11.5  13:25.61 node

Memory usage in agent container is growing (all addresses active here) :

2018-03-21T19:11:33.326Z agent info memoryUsage: {"rss":974401536,"heapTotal":861790208,"heapUsed":706318016,"external":25677989}
2018-03-21T19:11:44.273Z agent info memoryUsage: {"rss":970465280,"heapTotal":863887360,"heapUsed":700740512,"external":25026009}
2018-03-21T19:11:55.245Z agent info memoryUsage: {"rss":993488896,"heapTotal":882761728,"heapUsed":710079856,"external":27841162}
2018-03-21T19:12:06.005Z agent info memoryUsage: {"rss":1048915968,"heapTotal":893153280,"heapUsed":736778840,"external":24560837}
2018-03-21T19:12:17.149Z agent info memoryUsage: {"rss":1032990720,"heapTotal":920416256,"heapUsed":751109352,"external":27509007}
2018-03-21T19:12:28.139Z agent info memoryUsage: {"rss":1002680320,"heapTotal":896299008,"heapUsed":746150800,"external":21346512}
2018-03-21T19:12:39.276Z agent info memoryUsage: {"rss":1010847744,"heapTotal":905736192,"heapUsed":737315864,"external":25870521}
2018-03-21T19:12:50.029Z agent info memoryUsage: {"rss":1188294656,"heapTotal":920387584,"heapUsed":753642904,"external":27052203}
2018-03-21T19:13:01.348Z agent info memoryUsage: {"rss":1037041664,"heapTotal":931921920,"heapUsed":777528632,"external":32383563}
2018-03-21T19:13:12.594Z agent info memoryUsage: {"rss":1155084288,"heapTotal":934019072,"heapUsed":773970176,"external":23823693}
2018-03-21T19:13:23.916Z agent info memoryUsage: {"rss":1048711168,"heapTotal":942407680,"heapUsed":785318568,"external":24669724}
2018-03-21T19:13:35.082Z agent info memoryUsage: {"rss":1167368192,"heapTotal":941359104,"heapUsed":779608152,"external":22198545}
2018-03-21T19:13:46.762Z agent info memoryUsage: {"rss":1053048832,"heapTotal":947650560,"heapUsed":795987432,"external":33385782}
2018-03-21T19:13:57.756Z agent info memoryUsage: {"rss":1187852288,"heapTotal":954990592,"heapUsed":791434968,"external":21823205}
2018-03-21T19:14:08.970Z agent info memoryUsage: {"rss":1063333888,"heapTotal":959184896,"heapUsed":782595680,"external":26914455}
2018-03-21T19:14:20.293Z agent info memoryUsage: {"rss":1226559488,"heapTotal":984350720,"heapUsed":822691040,"external":29027519}
2018-03-21T19:14:31.422Z agent info memoryUsage: {"rss":1179160576,"heapTotal":972816384,"heapUsed":824286960,"external":20654142}
2018-03-21T19:14:42.673Z agent info memoryUsage: {"rss":1082613760,"heapTotal":978059264,"heapUsed":820064736,"external":23986994}
2018-03-21T19:14:53.794Z agent info memoryUsage: {"rss":1090334720,"heapTotal":982253568,"heapUsed":817118512,"external":31199966}
2018-03-21T19:15:05.195Z agent info memoryUsage: {"rss":1110749184,"heapTotal":1002176512,"heapUsed":831922624,"external":27058201}
2018-03-21T19:15:16.391Z agent info memoryUsage: {"rss":1106104320,"heapTotal":995885056,"heapUsed":827183080,"external":24427066}
2018-03-21T19:15:28.724Z agent info memoryUsage: {"rss":1107951616,"heapTotal":997994496,"heapUsed":852140232,"external":21534674}
2018-03-21T19:15:40.366Z agent info memoryUsage: {"rss":1249570816,"heapTotal":1002188800,"heapUsed":847199896,"external":23825987}
2018-03-21T19:15:51.997Z agent info memoryUsage: {"rss":1116422144,"heapTotal":1006383104,"heapUsed":859315112,"external":24426907}
2018-03-21T19:16:03.709Z agent info memoryUsage: {"rss":1120550912,"heapTotal":1010577408,"heapUsed":854869616,"external":24401762}
2018-03-21T19:16:14.735Z agent info memoryUsage: {"rss":1255944192,"heapTotal":1010577408,"heapUsed":850406296,"external":23955893}
2018-03-21T19:16:25.702Z agent info memoryUsage: {"rss":1250340864,"heapTotal":1016868864,"heapUsed":876385064,"external":23322844}
2018-03-21T19:16:36.704Z agent info memoryUsage: {"rss":1278078976,"heapTotal":1029451776,"heapUsed":868400384,"external":25833605}

Note that even having set the liveness timeout to 3s, failureThreshold to 10 and period to 30s for the agent container, it still restarted once because of failed liveness checks.

See also the complete logs: logs.zip

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Comments:9 (9 by maintainers)

github_iconTop GitHub Comments

1reaction
calohmncommented, Apr 30, 2018

Thanks! Looking at the K8s dashboard, CPU usage of the admin pod is down from 0.9 to 0.04 (with 1000 addresses).

0reactions
grscommented, Apr 8, 2018

Quick update: I have made some improvements, particular for steady state, and am finishing off some further changes that will hopefully improve the responsiveness at the point the 6k addresses are created. Did not quite get this completed yet and am off next week, but will be pursuing this and related issues as top priority on my return on 16th.

Read more comments on GitHub >

github_iconTop Results From Across the Web

ECS agent slowly leaking memory and/or not returning it to the ...
Our monitoring indicates that the ECS agent's memory usage grows unbounded over time. It eventually trips certain high-memory alert ...
Read more >
My Process Used Minimal Memory, and My Docker Memory ...
This post is a case study on how we discovered that writing large amounts of data inside a container has side effects with...
Read more >
High Memory utilization and their root causes | Dynatrace
Increasing memory is the obvious workaround for memory leaks or badly written software. Let's discuss the two most common causes for Java high...
Read more >
How to handle and avoid high CPU or Memory usage
Out of memory and high CPU usage are common reasons for unresponsive RabbitMQ servers. Find out how to solve it!.
Read more >
Runtime options with Memory, CPUs, and GPUs
Docker provides ways to control how much memory, or CPU a container can use, setting runtime configuration flags of the docker run command....
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found