question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

DNS storm with round robin load balancing in grpc-js

See original GitHub issue

Problem description

We are having some problems with the load balancing in grpc-js.

We are seeing uneven distribution of calls to our service pods, which ends up sometimes overloading some of them, while keeping others at very low load.

We think this might be because of the default load balancing strategy of “pick first”, so we tried enabling round robin but this caused a bunch of issues

  • The distribution, albeit more consistent, was still not uniform
  • The pods where the client was running started using 3 times the CPU and created a flurry of requests to our DNS

Any ideas how we could address this uneven distribution issue, and what could be wrong with load balancing?

Reproduction steps

Our (singleton) clients get instantiated with the DNS address of the service. The DNS returns the IP of all the available pods for the given service. We enable round robin load balancing by providing this configuration to the client:

'grpc.service_config': JSON.stringify({ loadBalancingConfig: [{ round_robin: {} }], })

There was no other change to the clients besides the lb config.

Environment

  • OS name, version and architecture: Debian GNU/Linux 10 (buster) x86
  • Node version: 14.17.6
  • Node installation method: yarn
  • Package name and version: 1.4.5

Additional context

When we tried to deploy the mentioned config change this the behavior we saw:

 (the baseline is for ~100 pods, while the spike is for just 4 canary pods where a single client configuration was changed) CPU: CPU spike

DNS requests:

 Screen Shot 2022-01-13 at 11 01 04 AM

Issue Analytics

  • State:open
  • Created 2 years ago
  • Reactions:2
  • Comments:13 (7 by maintainers)

github_iconTop GitHub Comments

2reactions
murgatroid99commented, Jan 14, 2022

I have published grpc-js 1.5.1 with some throttling on DNS requests. Can you try that out and see what impact it has?

2reactions
murgatroid99commented, Jan 13, 2022

I think I see what is happening here: the clients are failing to connect to some of the addresses returned by the DNS. Those connection failures trigger DNS re-resolution attempts, which do not back off in this situation. The lack of a backoff here is a bug that I will fix. The connection failures would also explain the uneven request distribution.

You can get logs with more information about what is happening here by setting the environment variables GRPC_TRACE=channel,round_robin,subchannel,dns_resolver and GRPC_VERBOSITY=DEBUG.

Read more comments on GitHub >

github_iconTop Results From Across the Web

gRPC Load Balancing
In Client side load balancing, the client is aware of multiple backend servers and chooses one to use for each RPC. The client...
Read more >
Load Balancing gRPC services (evanjones.ca)
The first is using round-robin DNS, where the DNS name has multiple IP addresses. In this case, the DNS client will pick one...
Read more >
Node JS: Will a Bidirection GRPC Call Open Multiple http2 ...
A LoadBalancer determines what specific connections to establish and how to distribute requests among those connections. The default load ...
Read more >
gRPC load balancing with grpc-go - Rafael Eyng's Blog
gRPC poses a known problem for load balancing if you have an L4 load balancer in front of multiple instances of your backend...
Read more >
Google Cloud Platform Resources Networking - GCP Weekly
Stop using Network Appliances in Google Cloud — Part 2 Load Balancers - An ... Cloud DNS routing policies (geo-location and weighted round...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found