question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Namerd memory leak

See original GitHub issue

Creating this issue as a follow-up regarding the discussions with @dadjeibaah in slack channel.

Issue Type:

  • Bug report

What happened: Namerd’s memory usage keeps increasing and will not perform garbage collection until its usage hit 80%~100%. A possible consequence is that, if the GC cannot be finished properly, the n4d pods will unable to serve any requests unless it’s restarted. A side fact is that, linkerd’s memory usuage has ever gone up higher than 20%, which might mean Namerd is not releasing the objects/resources properly which caused a memory leak.

Namerd Memory Usage Flow screen shot 2019-01-28 at 1 59 43 pm

Linkerd Memory Usage Flow screen shot 2019-01-28 at 1 58 33 pm

What you expected to happen: Namerd’s memory usage flow should be similar to what it’s shown for Linkerd.

How to reproduce it (as minimally and precisely as possible): N/A

Anything else we need to know?:

Environment:

  • linkerd/namerd version, config files: Namerd:1.6.0

Namerd config file

admin:
  ip: 0.0.0.0

telemetry:
- kind: io.l5d.prometheus
  prefix: l5d_n4d_

storage:
  kind: io.l5d.zk
  pathPrefix: /dtabs
  zkAddrs:
  - host: hostname1
    port: 2181
  - host: hostname2
    port: 2181
  - host: hostname3
    port: 2181

namers:
- kind: io.l5d.k8s
  prefix: -
  host: 127.0.0.1
  port: 8001
  transformers:
  - kind: io.l5d.k8s.daemonset
    namespace: mesh
    k8sHost: 127.0.0.1
    k8sPort: 8001
    port: in-http
    service: l5d
- kind: io.l5d.k8s
  prefix: -
  host: 127.0.0.1
  port: 8001
  transformers:
  - kind: io.l5d.k8s.daemonset
    namespace: mesh
    k8sHost: 127.0.0.1
    k8sPort: 8001
    port: in-grpc
    service: l5d
- kind: io.l5d.k8s
  prefix: -
  host: 127.0.0.1
  port: 8001

interfaces:
- kind: io.l5d.mesh
  ip: 0.0.0.0
  port: 4321
- kind: io.l5d.httpController
  ip: 0.0.0.0
  port: 4180
  • Platform, version, and config files (Kubernetes, DC/OS, etc): Kubernetes

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Reactions:4
  • Comments:5 (2 by maintainers)

github_iconTop GitHub Comments

2reactions
adleongcommented, Feb 7, 2019

@adw12382 thanks! This heap report is hugely helpful. We’ve got some theories about what might be happening that we’re trying to validate. I’ll keep this issue updated with our findings.

1reaction
adw12382commented, Jan 31, 2019

I investigated a bit and was able to generate the heap dump logs using jmap from the live n4d pod in our experimental environment.

Here are steps I used to generate the heap dump logs

  • Deploy Namerd into one of our clusters with replica number equals to 3.
  • Get into the pod and execute jmap -dump:format=b,file=namerdump.hprof {Java PID}. Thanks @dadjeibaah for sharing the command.
  • Copy the file out and analyze it using Eclipse - Memory Analyzer.

Then in the Leak Suspects section it shows the following - screen shot 2019-01-30 at 3 13 09 pm

It seems related to connections to zookeeper where we are storing our dtabs. Besides, according to the logs populated by n4d, every 15 minutes we receive around 2 thousands logs with message Attempting to observe dtab/*. I checked the source code . It seems checking whether the dtab exists or is valid, but since I am not familiar with Scala so do not know much detail regarding it.

The attachments are reports from Eclipse - Memory Analyzer, and also let me know if there is any other details I can provide.

dominatorTreeReport.zip ThreadDetailsReport.zip namerdumpLeakHunterReport.zip

Read more comments on GitHub >

github_iconTop Results From Across the Web

BIND 9 Contains Serious Memory Leak - Duo Security
Some versions of BIND 9 contain a severe memory leak that can exhaust the memory resources on a vulnerable server.
Read more >
1578051 – named eating memory up to OOM crash
Disabling it does not change anything - Seems named memory usage scales with ... 24 hours to have enough evidence to say yes...
Read more >
Solved: Memory Leak with Named Notifier - NI Community
I made "Notifier Test.vi" to send data on a named notifier every 5ms, but only a single notifier should ever be created. If...
Read more >
Telegraf 1.24.1 maintenance release fixes memory leak for ...
We fixed a memory leak for plugins using ParserFunc . We removed some special handling of parsers and unwrapped the remaining parsers.
Read more >
BIND 9.16 (named) :: Possible Memory Leak
Some configuration options to limit the cache memory fails but now I have used the option "-M" with the value "external". After the...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found