question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Potential memory leak in resolver-dns

See original GitHub issue

Problem Description Previously, we had an issue where upgrading from @grpc/grpc-js from 1.3.x to 1.5.x introduced a channelz memory leak (fixed in this issue for 1.5.10)

Upgrading to 1.5.10 locally seems to be fine and I have noticed no issues. However, when we upgraded our staging/production environments, a memory leak seems to come back with the only difference being updating from @grpc/grpc-js 1.3.x to 1.5.10.

Using Datadog’s continuous profiler, I wasn’t sure if this was the root issue, but there is definitely a growing heap.

Again, we are running a production service with a single grpc-js server that creates multiple grpc-js clients. The clients are created and destroyed using lightning-pool.

Channelz is disabled when we initialize the server/clients with 'grpc.enable_channelz': 0 (for server and clients)

Reproduction Steps The reproduction steps is still the same as before, except I guess this time the service is under staging/production load?

Create a single grpc-js server that calls grpc-js clients as needed from a pool resource with channelz disabled. In our case, the server is running and when requests are made, we acquire a client via the pool (factory created once as a singleton) to make a request. These should be able to handle concurrent/multiple requests.

Environment

  • OS Name: macOS (locally testing) and running on AWS EKS clusters (production)
  • Node Version: 14.16.0
  • Package Name and Version: @grpc/grpc-js@1.5.10

Additional Context Checking out the profiler with Heap Live Size, it looks like there is a growing heap size for backoff-timeout.js, resolver-dns.js, load-balancer-child-handler.js, load-balancer-round-robin.js and channel.ts. I let it run for about 2.5 hours and I am comparing the heap profiles from the first 30mins and the last 30 minutes to see what has changed. When comparing with @grpc/grpc-js@1.3.x, these look like they aren’t used.

I see that 1.6.x made some updates to some timers, was wondering if it could be related?

Happy to provide more context or help as needed.

NOTE: Clarifying the graph, the start/end time of the problem starts within the highlighted intervals. Everything else is from a different process and rolling the package back.

Screen Shot 2022-04-05 at 3 29 42 PM

(Detail view of the other red section from above) Screen Shot 2022-04-05 at 3 41 34 PM

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:21 (12 by maintainers)

github_iconTop GitHub Comments

1reaction
murgatroid99commented, Apr 22, 2022

The requested tests have been added in #2105.

@sam-la-compass Can you check if the latest version of grpc-js fixes the original bug for you?

0reactions
murgatroid99commented, Nov 15, 2022

In the third image, the tooltip for “addTrace (channelz.js)” seems to be covering up the information about the top three contributors to the heap size. Can you say what those top three items are or share another screenshot that shows them? The top one in particular seems to be a very large fraction of the heap.

I think I can partially explain the failed DNS requests: those addresses look like they are supposed to be an IPv6 address plus a port, but the syntax is wrong: an IPv6 address needs to be enclosed in square brackets ([]) to use a port with it. For example, the proper syntax to represent the address in the top log is [2a00:1450:400f:801::200a]:443. You can fix that if you know what the source of those addresses is, but I am not sure why gRPC would still not treat it as an IPv6 address anyway.

Read more comments on GitHub >

github_iconTop Results From Across the Web

BIND 9 Contains Serious Memory Leak - Duo Security
Some versions of BIND 9 contain a severe memory leak that can exhaust the memory resources on a vulnerable server.
Read more >
DnsNameResolver leaks memory on error #6274 - GitHub
Recent access records: 1 #1: Hint: 'DnsNameResolver$DnsResponseHandler#0' will handle the message from this point. io.netty.handler.codec.dns.
Read more >
Python simple DNS resolver: memory leak - Stack Overflow
This code is a DNS resolver that check from a DB for an entry not older than 5 minutes. ... This code leaks...
Read more >
#2872: memory leak in dns code
I set up a series of realm R1.MIT.EDU .. R4.MIT.EDU with cross-realm keys, got a ticket as principal x@R1, and ran "kvno service2@R4.MIT....
Read more >
domain name system - Possible DNS Server memory leak
I have a Windows Server 2012 R2 physical server running AD DS, DNS roles. Very frequently DNS Server process starts ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found