question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

grpc-js: requests hang indefinitely when executed while the `ResolvingLoadBalancer` class is in the `TRANSIENT_FAILURE` state

See original GitHub issue

Problem description

gRPC requests executed when the ResolvingLoadBalancer class is in the TRANSIENT_FAILURE state hang indefinitely, even after the backoff timer has finished and reset the resolver back to the IDLE state. This can occur when the gRPC client’s DNS resolution fails but the client continues to send requests to the service.

Reproduction steps

  1. Clone this repo: https://github.com/chrskrchr/grpc-js-dns-hang
  2. Run npm install
  3. Run npm run start
  4. The script does the following:
    1. Creates a gRPC client with a dummy service definition and a bogus address
    2. Executes a request to the bogus address that fails immediately with the expected error:
      • Error: 14 UNAVAILABLE: Name resolution failed for target dns:bogus.host
    3. Sleeps for 1000ms, which is notably less than the client’s configured 2500ms backoff setting on L38:
      • "grpc.initial_reconnect_backoff_ms": 2500
    4. Executes a second request to the bogus address

This second request hangs indefinitely, event after the reconnect backoff expires and the resolver has been set back to the IDLE state.

Environment

  • macOS Monterrey (12.3)
  • Node v14.18.1
  • Node installation method: nvm
  • grpc-js@1.6.3

Additional context

Output from the script when the second request is executed while the resolver is in the TRANSIENT_FAILURE state, causing the second request to hang indefinitely:

➜  grpc-js-dns-hang git:(master) ✗ npm run start

> grpc-js-dns-hang@1.0.0 start /Users/chris.karcher/src/care/grpc-js-dns-hang
> GRPC_VERBOSITY=DEBUG GRPC_TRACE=all node index.js

D 2022-04-12T19:58:19.914Z | index | Loading @grpc/grpc-js version 1.6.3
D 2022-04-12T19:58:19.992Z | resolving_load_balancer | dns:bogus.host IDLE -> IDLE
D 2022-04-12T19:58:19.992Z | connectivity_state | (1) dns:bogus.host IDLE -> IDLE
D 2022-04-12T19:58:19.992Z | dns_resolver | Resolver constructed for target dns:bogus.host
D 2022-04-12T19:58:19.994Z | channel | (1) dns:bogus.host Channel constructed with options {
  "grpc.initial_reconnect_backoff_ms": 2500
}
D 2022-04-12T19:58:19.994Z | channel_stacktrace | (1) Channel constructed 
    at new ChannelImplementation (/Users/chris.karcher/src/care/grpc-js-dns-hang/node_modules/@grpc/grpc-js/build/src/channel.js:189:23)
    at new Client (/Users/chris.karcher/src/care/grpc-js-dns-hang/node_modules/@grpc/grpc-js/build/src/client.js:62:36)
    at new ServiceClientImpl (/Users/chris.karcher/src/care/grpc-js-dns-hang/node_modules/@grpc/grpc-js/build/src/make-client.js:58:5)
    at Object.<anonymous> (/Users/chris.karcher/src/care/grpc-js-dns-hang/index.js:37:16)
    at Module._compile (internal/modules/cjs/loader.js:1085:14)
    at Object.Module._extensions..js (internal/modules/cjs/loader.js:1114:10)
    at Module.load (internal/modules/cjs/loader.js:950:32)
    at Function.Module._load (internal/modules/cjs/loader.js:790:12)
    at Function.executeUserEntryPoint [as runMain] (internal/modules/run_main.js:76:12)
    at internal/main/run_main_module.js:17:47
executing request #1
D 2022-04-12T19:58:19.996Z | channel | (1) dns:bogus.host createCall [0] method="/PingAPI/Ping", deadline=Infinity
D 2022-04-12T19:58:19.997Z | call_stream | [0] Sending metadata
D 2022-04-12T19:58:19.997Z | dns_resolver | Looking up DNS hostname bogus.host
D 2022-04-12T19:58:19.999Z | resolving_load_balancer | dns:bogus.host IDLE -> CONNECTING
D 2022-04-12T19:58:20.000Z | connectivity_state | (1) dns:bogus.host IDLE -> CONNECTING
D 2022-04-12T19:58:20.000Z | resolving_load_balancer | dns:bogus.host CONNECTING -> CONNECTING
D 2022-04-12T19:58:20.000Z | connectivity_state | (1) dns:bogus.host CONNECTING -> CONNECTING
D 2022-04-12T19:58:20.000Z | channel | (1) dns:bogus.host callRefTimer.ref | configSelectionQueue.length=1 pickQueue.length=0
D 2022-04-12T19:58:20.001Z | call_stream | [0] write() called with message of length 0
D 2022-04-12T19:58:20.001Z | call_stream | [0] end() called
D 2022-04-12T19:58:20.002Z | call_stream | [0] deferring writing data chunk of length 5
D 2022-04-12T19:58:20.066Z | dns_resolver | Resolution error for target dns:bogus.host: getaddrinfo ENOTFOUND bogus.host
D 2022-04-12T19:58:20.067Z | resolving_load_balancer | dns:bogus.host CONNECTING -> TRANSIENT_FAILURE
D 2022-04-12T19:58:20.067Z | channel | (1) dns:bogus.host callRefTimer.unref | configSelectionQueue.length=1 pickQueue.length=0
D 2022-04-12T19:58:20.067Z | connectivity_state | (1) dns:bogus.host CONNECTING -> TRANSIENT_FAILURE
D 2022-04-12T19:58:20.067Z | channel | (1) dns:bogus.host Name resolution failed with calls queued for config selection
D 2022-04-12T19:58:20.067Z | call_stream | [0] cancelWithStatus code: 14 details: "Name resolution failed for target dns:bogus.host"
D 2022-04-12T19:58:20.067Z | call_stream | [0] ended with status: code=14 details="Name resolution failed for target dns:bogus.host"
Error: 14 UNAVAILABLE: Name resolution failed for target dns:bogus.host
    at Object.callErrorFromStatus (/Users/chris.karcher/src/care/grpc-js-dns-hang/node_modules/@grpc/grpc-js/build/src/call.js:31:26)
    at Object.onReceiveStatus (/Users/chris.karcher/src/care/grpc-js-dns-hang/node_modules/@grpc/grpc-js/build/src/client.js:180:52)
    at Object.onReceiveStatus (/Users/chris.karcher/src/care/grpc-js-dns-hang/node_modules/@grpc/grpc-js/build/src/client-interceptors.js:365:141)
    at Object.onReceiveStatus (/Users/chris.karcher/src/care/grpc-js-dns-hang/node_modules/@grpc/grpc-js/build/src/client-interceptors.js:328:181)
    at /Users/chris.karcher/src/care/grpc-js-dns-hang/node_modules/@grpc/grpc-js/build/src/call-stream.js:187:78
    at processTicksAndRejections (internal/process/task_queues.js:77:11) {
  code: 14,
  details: 'Name resolution failed for target dns:bogus.host',
  metadata: Metadata { internalRepr: Map(0) {}, options: {} }
}
request #1 finished
sleeping...
executing request #2
D 2022-04-12T19:58:21.071Z | channel | (1) dns:bogus.host createCall [1] method="/PingAPI/Ping", deadline=Infinity
D 2022-04-12T19:58:21.071Z | call_stream | [1] Sending metadata
D 2022-04-12T19:58:21.072Z | channel | (1) dns:bogus.host callRefTimer.ref | configSelectionQueue.length=1 pickQueue.length=0
D 2022-04-12T19:58:21.072Z | call_stream | [1] write() called with message of length 0
D 2022-04-12T19:58:21.072Z | call_stream | [1] end() called
D 2022-04-12T19:58:21.072Z | call_stream | [1] deferring writing data chunk of length 5
D 2022-04-12T19:58:22.568Z | resolving_load_balancer | dns:bogus.host TRANSIENT_FAILURE -> IDLE
D 2022-04-12T19:58:22.568Z | channel | (1) dns:bogus.host callRefTimer.unref | configSelectionQueue.length=1 pickQueue.length=0
D 2022-04-12T19:58:22.568Z | connectivity_state | (1) dns:bogus.host TRANSIENT_FAILURE -> IDLE

If the sleep duration on L52 is increased to something higher than the client’s backoff setting (e.g., increased to 5000ms), the resolver is allowed to return to the IDLE state and the second request fails immediately as expected just like the first request.

➜  grpc-js-dns-hang git:(master) ✗ npm run start

> grpc-js-dns-hang@1.0.0 start /Users/chris.karcher/src/care/grpc-js-dns-hang
> GRPC_VERBOSITY=DEBUG GRPC_TRACE=all node index.js

D 2022-04-12T19:58:53.273Z | index | Loading @grpc/grpc-js version 1.6.3
D 2022-04-12T19:58:53.316Z | resolving_load_balancer | dns:bogus.host IDLE -> IDLE
D 2022-04-12T19:58:53.317Z | connectivity_state | (1) dns:bogus.host IDLE -> IDLE
D 2022-04-12T19:58:53.317Z | dns_resolver | Resolver constructed for target dns:bogus.host
D 2022-04-12T19:58:53.318Z | channel | (1) dns:bogus.host Channel constructed with options {
  "grpc.initial_reconnect_backoff_ms": 2500
}
D 2022-04-12T19:58:53.318Z | channel_stacktrace | (1) Channel constructed 
    at new ChannelImplementation (/Users/chris.karcher/src/care/grpc-js-dns-hang/node_modules/@grpc/grpc-js/build/src/channel.js:189:23)
    at new Client (/Users/chris.karcher/src/care/grpc-js-dns-hang/node_modules/@grpc/grpc-js/build/src/client.js:62:36)
    at new ServiceClientImpl (/Users/chris.karcher/src/care/grpc-js-dns-hang/node_modules/@grpc/grpc-js/build/src/make-client.js:58:5)
    at Object.<anonymous> (/Users/chris.karcher/src/care/grpc-js-dns-hang/index.js:37:16)
    at Module._compile (internal/modules/cjs/loader.js:1085:14)
    at Object.Module._extensions..js (internal/modules/cjs/loader.js:1114:10)
    at Module.load (internal/modules/cjs/loader.js:950:32)
    at Function.Module._load (internal/modules/cjs/loader.js:790:12)
    at Function.executeUserEntryPoint [as runMain] (internal/modules/run_main.js:76:12)
    at internal/main/run_main_module.js:17:47
executing request #1
D 2022-04-12T19:58:53.320Z | channel | (1) dns:bogus.host createCall [0] method="/PingAPI/Ping", deadline=Infinity
D 2022-04-12T19:58:53.321Z | call_stream | [0] Sending metadata
D 2022-04-12T19:58:53.322Z | dns_resolver | Looking up DNS hostname bogus.host
D 2022-04-12T19:58:53.323Z | resolving_load_balancer | dns:bogus.host IDLE -> CONNECTING
D 2022-04-12T19:58:53.323Z | connectivity_state | (1) dns:bogus.host IDLE -> CONNECTING
D 2022-04-12T19:58:53.323Z | resolving_load_balancer | dns:bogus.host CONNECTING -> CONNECTING
D 2022-04-12T19:58:53.323Z | connectivity_state | (1) dns:bogus.host CONNECTING -> CONNECTING
D 2022-04-12T19:58:53.324Z | channel | (1) dns:bogus.host callRefTimer.ref | configSelectionQueue.length=1 pickQueue.length=0
D 2022-04-12T19:58:53.325Z | call_stream | [0] write() called with message of length 0
D 2022-04-12T19:58:53.326Z | call_stream | [0] end() called
D 2022-04-12T19:58:53.327Z | call_stream | [0] deferring writing data chunk of length 5
D 2022-04-12T19:58:53.328Z | dns_resolver | Resolution error for target dns:bogus.host: getaddrinfo ENOTFOUND bogus.host
D 2022-04-12T19:58:53.328Z | resolving_load_balancer | dns:bogus.host CONNECTING -> TRANSIENT_FAILURE
D 2022-04-12T19:58:53.328Z | channel | (1) dns:bogus.host callRefTimer.unref | configSelectionQueue.length=1 pickQueue.length=0
D 2022-04-12T19:58:53.328Z | connectivity_state | (1) dns:bogus.host CONNECTING -> TRANSIENT_FAILURE
D 2022-04-12T19:58:53.328Z | channel | (1) dns:bogus.host Name resolution failed with calls queued for config selection
D 2022-04-12T19:58:53.328Z | call_stream | [0] cancelWithStatus code: 14 details: "Name resolution failed for target dns:bogus.host"
D 2022-04-12T19:58:53.328Z | call_stream | [0] ended with status: code=14 details="Name resolution failed for target dns:bogus.host"
Error: 14 UNAVAILABLE: Name resolution failed for target dns:bogus.host
    at Object.callErrorFromStatus (/Users/chris.karcher/src/care/grpc-js-dns-hang/node_modules/@grpc/grpc-js/build/src/call.js:31:26)
    at Object.onReceiveStatus (/Users/chris.karcher/src/care/grpc-js-dns-hang/node_modules/@grpc/grpc-js/build/src/client.js:180:52)
    at Object.onReceiveStatus (/Users/chris.karcher/src/care/grpc-js-dns-hang/node_modules/@grpc/grpc-js/build/src/client-interceptors.js:365:141)
    at Object.onReceiveStatus (/Users/chris.karcher/src/care/grpc-js-dns-hang/node_modules/@grpc/grpc-js/build/src/client-interceptors.js:328:181)
    at /Users/chris.karcher/src/care/grpc-js-dns-hang/node_modules/@grpc/grpc-js/build/src/call-stream.js:187:78
    at processTicksAndRejections (internal/process/task_queues.js:77:11) {
  code: 14,
  details: 'Name resolution failed for target dns:bogus.host',
  metadata: Metadata { internalRepr: Map(0) {}, options: {} }
}
request #1 finished
sleeping...
D 2022-04-12T19:58:55.829Z | resolving_load_balancer | dns:bogus.host TRANSIENT_FAILURE -> IDLE
D 2022-04-12T19:58:55.830Z | connectivity_state | (1) dns:bogus.host TRANSIENT_FAILURE -> IDLE
executing request #2
D 2022-04-12T19:58:58.332Z | channel | (1) dns:bogus.host createCall [1] method="/PingAPI/Ping", deadline=Infinity
D 2022-04-12T19:58:58.332Z | call_stream | [1] Sending metadata
D 2022-04-12T19:58:58.332Z | dns_resolver | Looking up DNS hostname bogus.host
D 2022-04-12T19:58:58.333Z | resolving_load_balancer | dns:bogus.host IDLE -> CONNECTING
D 2022-04-12T19:58:58.333Z | connectivity_state | (1) dns:bogus.host IDLE -> CONNECTING
D 2022-04-12T19:58:58.333Z | resolving_load_balancer | dns:bogus.host CONNECTING -> CONNECTING
D 2022-04-12T19:58:58.333Z | connectivity_state | (1) dns:bogus.host CONNECTING -> CONNECTING
D 2022-04-12T19:58:58.333Z | channel | (1) dns:bogus.host callRefTimer.ref | configSelectionQueue.length=1 pickQueue.length=0
D 2022-04-12T19:58:58.333Z | call_stream | [1] write() called with message of length 0
D 2022-04-12T19:58:58.333Z | call_stream | [1] end() called
D 2022-04-12T19:58:58.333Z | call_stream | [1] deferring writing data chunk of length 5
D 2022-04-12T19:58:58.334Z | dns_resolver | Resolution error for target dns:bogus.host: getaddrinfo ENOTFOUND bogus.host
D 2022-04-12T19:58:58.334Z | resolving_load_balancer | dns:bogus.host CONNECTING -> TRANSIENT_FAILURE
D 2022-04-12T19:58:58.334Z | channel | (1) dns:bogus.host callRefTimer.unref | configSelectionQueue.length=1 pickQueue.length=0
D 2022-04-12T19:58:58.334Z | connectivity_state | (1) dns:bogus.host CONNECTING -> TRANSIENT_FAILURE
D 2022-04-12T19:58:58.334Z | channel | (1) dns:bogus.host Name resolution failed with calls queued for config selection
D 2022-04-12T19:58:58.334Z | call_stream | [1] cancelWithStatus code: 14 details: "Name resolution failed for target dns:bogus.host"
D 2022-04-12T19:58:58.334Z | call_stream | [1] ended with status: code=14 details="Name resolution failed for target dns:bogus.host"
Error: 14 UNAVAILABLE: Name resolution failed for target dns:bogus.host
    at Object.callErrorFromStatus (/Users/chris.karcher/src/care/grpc-js-dns-hang/node_modules/@grpc/grpc-js/build/src/call.js:31:26)
    at Object.onReceiveStatus (/Users/chris.karcher/src/care/grpc-js-dns-hang/node_modules/@grpc/grpc-js/build/src/client.js:180:52)
    at Object.onReceiveStatus (/Users/chris.karcher/src/care/grpc-js-dns-hang/node_modules/@grpc/grpc-js/build/src/client-interceptors.js:365:141)
    at Object.onReceiveStatus (/Users/chris.karcher/src/care/grpc-js-dns-hang/node_modules/@grpc/grpc-js/build/src/client-interceptors.js:328:181)
    at /Users/chris.karcher/src/care/grpc-js-dns-hang/node_modules/@grpc/grpc-js/build/src/call-stream.js:187:78
    at processTicksAndRejections (internal/process/task_queues.js:77:11) {
  code: 14,
  details: 'Name resolution failed for target dns:bogus.host',
  metadata: Metadata { internalRepr: Map(0) {}, options: {} }
}
finished

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:6 (2 by maintainers)

github_iconTop GitHub Comments

1reaction
qiuspikecommented, Jul 19, 2022

Thanks a lot. This issue really helped me solving my problems.

1reaction
murgatroid99commented, Apr 14, 2022

I published that change in version 1.6.4. Please try it out.

Read more comments on GitHub >

github_iconTop Results From Across the Web

GRPC Core: Status codes and their use in gRPC
Code Number Description OK 0 Not an error; returned on success. FAILED_PRECONDITION 9 OUT_OF_RANGE 11
Read more >
gRPC service in Node.js: Tutorial, Examples and Best practices
The protocol buffer states the types, and shape of each request and response. The client uses the protocol buffer to get a service...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found