Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

agent container: high load and memory issue with many addresses

See original GitHub issue

After having added 6000 addresses, the agent container constantly shows a high load and growing memory usage.

Test setup: Test was done on minikube using EnMasse master (last commit from 2018-03-16). Extra memoryUsage log output (with preceding gc() invocation) was added to the agent (10s interval). Using a script, 3000 pooled-queue and 3000 standard-anycast addresses have been created by directly adding the corresponding K8s ConfigMaps. In order not to create too many router/broker instances, the credit settings in the plan.yaml have been reduced so that there are 2 brokers and 4 routers in the end for the created addresses. The admin container was configured with a memory setting of “2Gi”, readiness/liveness check timeout and failureThreshold have been increased.

Outcome: Load/CPU usage of agent container after all addresses are active:

top - 19:16:14 up  1:04,  0 users,  load average: 11.22, 10.08, 7.89
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
    1 root      20   0 1990632 1.044g  17880 R  86.1 10.7  13:20.65 node
top - 19:16:15 up  1:04,  0 users,  load average: 11.28, 10.11, 7.91
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
    1 root      20   0 2004264 1.057g  17880 R  60.0 10.8  13:21.25 node
top - 19:16:16 up  1:04,  0 users,  load average: 11.28, 10.11, 7.91
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
    1 root      20   0 2028936 1.080g  17880 R  79.2 11.1  13:22.05 node
top - 19:16:17 up  1:04,  0 users,  load average: 11.28, 10.11, 7.91
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
    1 root      20   0 2039496 1.090g  17880 R  76.0 11.2  13:22.81 node
top - 19:16:18 up  1:04,  0 users,  load average: 11.28, 10.11, 7.91
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
    1 root      20   0 2050056 1.101g  17880 R  88.0 11.3  13:23.69 node
top - 19:16:19 up  1:04,  0 users,  load average: 11.28, 10.11, 7.91
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
    1 root      20   0 2063576 1.113g  17880 R  98.0 11.4  13:24.67 node
top - 19:16:20 up  1:04,  0 users,  load average: 10.46, 9.96, 7.88
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
    1 root      20   0 2076248 1.125g  17880 R  93.1 11.5  13:25.61 node

Memory usage in agent container is growing (all addresses active here) :

2018-03-21T19:11:33.326Z agent info memoryUsage: {"rss":974401536,"heapTotal":861790208,"heapUsed":706318016,"external":25677989}
2018-03-21T19:11:44.273Z agent info memoryUsage: {"rss":970465280,"heapTotal":863887360,"heapUsed":700740512,"external":25026009}
2018-03-21T19:11:55.245Z agent info memoryUsage: {"rss":993488896,"heapTotal":882761728,"heapUsed":710079856,"external":27841162}
2018-03-21T19:12:06.005Z agent info memoryUsage: {"rss":1048915968,"heapTotal":893153280,"heapUsed":736778840,"external":24560837}
2018-03-21T19:12:17.149Z agent info memoryUsage: {"rss":1032990720,"heapTotal":920416256,"heapUsed":751109352,"external":27509007}
2018-03-21T19:12:28.139Z agent info memoryUsage: {"rss":1002680320,"heapTotal":896299008,"heapUsed":746150800,"external":21346512}
2018-03-21T19:12:39.276Z agent info memoryUsage: {"rss":1010847744,"heapTotal":905736192,"heapUsed":737315864,"external":25870521}
2018-03-21T19:12:50.029Z agent info memoryUsage: {"rss":1188294656,"heapTotal":920387584,"heapUsed":753642904,"external":27052203}
2018-03-21T19:13:01.348Z agent info memoryUsage: {"rss":1037041664,"heapTotal":931921920,"heapUsed":777528632,"external":32383563}
2018-03-21T19:13:12.594Z agent info memoryUsage: {"rss":1155084288,"heapTotal":934019072,"heapUsed":773970176,"external":23823693}
2018-03-21T19:13:23.916Z agent info memoryUsage: {"rss":1048711168,"heapTotal":942407680,"heapUsed":785318568,"external":24669724}
2018-03-21T19:13:35.082Z agent info memoryUsage: {"rss":1167368192,"heapTotal":941359104,"heapUsed":779608152,"external":22198545}
2018-03-21T19:13:46.762Z agent info memoryUsage: {"rss":1053048832,"heapTotal":947650560,"heapUsed":795987432,"external":33385782}
2018-03-21T19:13:57.756Z agent info memoryUsage: {"rss":1187852288,"heapTotal":954990592,"heapUsed":791434968,"external":21823205}
2018-03-21T19:14:08.970Z agent info memoryUsage: {"rss":1063333888,"heapTotal":959184896,"heapUsed":782595680,"external":26914455}
2018-03-21T19:14:20.293Z agent info memoryUsage: {"rss":1226559488,"heapTotal":984350720,"heapUsed":822691040,"external":29027519}
2018-03-21T19:14:31.422Z agent info memoryUsage: {"rss":1179160576,"heapTotal":972816384,"heapUsed":824286960,"external":20654142}
2018-03-21T19:14:42.673Z agent info memoryUsage: {"rss":1082613760,"heapTotal":978059264,"heapUsed":820064736,"external":23986994}
2018-03-21T19:14:53.794Z agent info memoryUsage: {"rss":1090334720,"heapTotal":982253568,"heapUsed":817118512,"external":31199966}
2018-03-21T19:15:05.195Z agent info memoryUsage: {"rss":1110749184,"heapTotal":1002176512,"heapUsed":831922624,"external":27058201}
2018-03-21T19:15:16.391Z agent info memoryUsage: {"rss":1106104320,"heapTotal":995885056,"heapUsed":827183080,"external":24427066}
2018-03-21T19:15:28.724Z agent info memoryUsage: {"rss":1107951616,"heapTotal":997994496,"heapUsed":852140232,"external":21534674}
2018-03-21T19:15:40.366Z agent info memoryUsage: {"rss":1249570816,"heapTotal":1002188800,"heapUsed":847199896,"external":23825987}
2018-03-21T19:15:51.997Z agent info memoryUsage: {"rss":1116422144,"heapTotal":1006383104,"heapUsed":859315112,"external":24426907}
2018-03-21T19:16:03.709Z agent info memoryUsage: {"rss":1120550912,"heapTotal":1010577408,"heapUsed":854869616,"external":24401762}
2018-03-21T19:16:14.735Z agent info memoryUsage: {"rss":1255944192,"heapTotal":1010577408,"heapUsed":850406296,"external":23955893}
2018-03-21T19:16:25.702Z agent info memoryUsage: {"rss":1250340864,"heapTotal":1016868864,"heapUsed":876385064,"external":23322844}
2018-03-21T19:16:36.704Z agent info memoryUsage: {"rss":1278078976,"heapTotal":1029451776,"heapUsed":868400384,"external":25833605}

Note that even having set the liveness timeout to 3s, failureThreshold to 10 and period to 30s for the agent container, it still restarted once because of failed liveness checks.

Issue Analytics

State:
Created 5 years ago
Comments:9 (9 by maintainers)

Top GitHub Comments

1reaction

calohmncommented, Apr 30, 2018

Thanks! Looking at the K8s dashboard, CPU usage of the admin pod is down from 0.9 to 0.04 (with 1000 addresses).

0reactions

grscommented, Apr 8, 2018

Quick update: I have made some improvements, particular for steady state, and am finishing off some further changes that will hopefully improve the responsiveness at the point the 6k addresses are created. Did not quite get this completed yet and am off next week, but will be pursuing this and related issues as top priority on my return on 16th.