question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Connecting to shutdown servers hangs client a randomly time to emit error 14 UNAVAILABLE

See original GitHub issue

Problem description

I have configured 4 clients, to make unary calls to 4 servers. Each client is an instance from a class, and stored in a js Map. Some calls need to do on all servers, and another calls only calls one. Because few servers are shutdown, I send the same call (with each client corresponding to the instance with the ip:port of each server) and get for those are shutdown error 14 UNAVAILABLE. That’s fine! But randomly some clients hangs waiting for the connection error. So, every call to a shutdown server giveme error 14 at randomly time, from miliseconds, to 30 seconds!!! Why? I suppose the error should emited instantaneous. This cause me issues, because I need to check all servers before continue with de application code.

Reproduction steps

Create a simple grpc client and try to make an unary call to any no existent server ip:port. You get and error 14. Repeat the same call few times, and you will see that sometimes client hangs for seconds.

Environment

  • OS name, version and architecture: Linux Centos 7 64bits
  • Node Version 14.18.0
  • Node installation method: Binary
  • Package name and version @grpc/grpc-js@1.5.1 (tested with 1.5.3 also)

Additional context

I was simplified my client to only one to reproduce the same issue. I think this issue is relate with those are around there suche as #1591 #1815 or https://stackoverflow.com/questions/61565913/why-does-my-node-js-grpc-client-take-3-seconds-to-send-a-request-to-my-python-gr

Of course, I was setup a deadline for the client to avoid hangs waiting for the error 14, but this is a workaround. A 1 second deadline is ok, but it is not a fix.

This is the console output. As you can see, there is 30 seconds waiting to connect. I want to fail instantly

D 2022-01-25T23:35:05.884Z | channel | (1) dns:192.168.10.20:50051 createCall [11] method="/dvdriver.DriverService/GetAffiliation", deadline=Infinity
D 2022-01-25T23:35:05.885Z | call_stream | [11] Sending metadata
D 2022-01-25T23:35:05.885Z | channel | (1) dns:192.168.10.20:50051 Pick result: QUEUE subchannel: undefined status: undefined undefined
D 2022-01-25T23:35:05.885Z | channel | (1) dns:192.168.10.20:50051 callRefTimer.ref | configSelectionQueue.length=0 pickQueue.length=1
D 2022-01-25T23:35:05.885Z | call_stream | [11] write() called with message of length 27
D 2022-01-25T23:35:05.885Z | call_stream | [11] end() called
D 2022-01-25T23:35:05.885Z | resolving_load_balancer | dns:192.168.10.20:50051 IDLE -> CONNECTING
D 2022-01-25T23:35:05.885Z | channel | (1) dns:192.168.10.20:50051 callRefTimer.unref | configSelectionQueue.length=0 pickQueue.length=0
D 2022-01-25T23:35:05.885Z | channel | (1) dns:192.168.10.20:50051 Pick result: QUEUE subchannel: undefined status: undefined undefined
D 2022-01-25T23:35:05.885Z | channel | (1) dns:192.168.10.20:50051 callRefTimer.ref | configSelectionQueue.length=0 pickQueue.length=1
D 2022-01-25T23:35:05.885Z | connectivity_state | (1) dns:192.168.10.20:50051 IDLE -> CONNECTING
D 2022-01-25T23:35:05.885Z | resolving_load_balancer | dns:192.168.10.20:50051 CONNECTING -> CONNECTING
D 2022-01-25T23:35:05.885Z | channel | (1) dns:192.168.10.20:50051 callRefTimer.unref | configSelectionQueue.length=0 pickQueue.length=0
D 2022-01-25T23:35:05.885Z | channel | (1) dns:192.168.10.20:50051 Pick result: QUEUE subchannel: undefined status: undefined undefined
D 2022-01-25T23:35:05.885Z | channel | (1) dns:192.168.10.20:50051 callRefTimer.ref | configSelectionQueue.length=0 pickQueue.length=1
D 2022-01-25T23:35:05.886Z | connectivity_state | (1) dns:192.168.10.20:50051 CONNECTING -> CONNECTING
D 2022-01-25T23:35:05.886Z | call_stream | [11] deferring writing data chunk of length 32

<--------------- 3 SECONDS LATER ------------->
D 2022-01-25T23:35:08.852Z | subchannel_refcount | (16) 192.168.10.20:50051 refcount 1 -> 0



<--------------- 27 SECONDS LATER ------------->
D 2022-01-25T23:35:32.889Z | dns_resolver | Returning IP address for target dns:192.168.10.20:50051
D 2022-01-25T23:35:32.890Z | pick_first | Connect to address list 192.168.10.20:50051
D 2022-01-25T23:35:32.890Z | subchannel | (19) 192.168.10.20:50051 Subchannel constructed with options {}
D 2022-01-25T23:35:32.890Z | subchannel_refcount | (19) 192.168.10.20:50051 refcount 0 -> 1
D 2022-01-25T23:35:32.890Z | subchannel_refcount | (19) 192.168.10.20:50051 refcount 1 -> 2
D 2022-01-25T23:35:32.890Z | pick_first | Start connecting to subchannel with address 192.168.10.20:50051
D 2022-01-25T23:35:32.890Z | pick_first | IDLE -> CONNECTING
D 2022-01-25T23:35:32.890Z | resolving_load_balancer | dns:192.168.10.20:50051 CONNECTING -> CONNECTING
D 2022-01-25T23:35:32.890Z | channel | (1) dns:192.168.10.20:50051 callRefTimer.unref | configSelectionQueue.length=0 pickQueue.length=0
D 2022-01-25T23:35:32.890Z | channel | (1) dns:192.168.10.20:50051 Pick result: QUEUE subchannel: undefined status: undefined undefined
D 2022-01-25T23:35:32.890Z | channel | (1) dns:192.168.10.20:50051 callRefTimer.ref | configSelectionQueue.length=0 pickQueue.length=1
D 2022-01-25T23:35:32.890Z | connectivity_state | (1) dns:192.168.10.20:50051 CONNECTING -> CONNECTING
D 2022-01-25T23:35:32.891Z | subchannel | (19) 192.168.10.20:50051 IDLE -> CONNECTING
D 2022-01-25T23:35:32.891Z | pick_first | CONNECTING -> CONNECTING
D 2022-01-25T23:35:32.891Z | resolving_load_balancer | dns:192.168.10.20:50051 CONNECTING -> CONNECTING
D 2022-01-25T23:35:32.891Z | channel | (1) dns:192.168.10.20:50051 callRefTimer.unref | configSelectionQueue.length=0 pickQueue.length=0
D 2022-01-25T23:35:32.891Z | channel | (1) dns:192.168.10.20:50051 Pick result: QUEUE subchannel: undefined status: undefined undefined
D 2022-01-25T23:35:32.891Z | channel | (1) dns:192.168.10.20:50051 callRefTimer.ref | configSelectionQueue.length=0 pickQueue.length=1
D 2022-01-25T23:35:32.891Z | connectivity_state | (1) dns:192.168.10.20:50051 CONNECTING -> CONNECTING
D 2022-01-25T23:35:32.891Z | channel | (1) dns:192.168.10.20:50051 callRefTimer.unref | configSelectionQueue.length=0 pickQueue.length=1
D 2022-01-25T23:35:32.891Z | subchannel | (19) 192.168.10.20:50051 creating HTTP/2 session
D 2022-01-25T23:35:32.895Z | subchannel | (19) 192.168.10.20:50051 connection closed with error connect EHOSTUNREACH 192.168.10.20:50051
D 2022-01-25T23:35:32.895Z | subchannel | (19) 192.168.10.20:50051 connection closed
D 2022-01-25T23:35:32.895Z | subchannel | (19) 192.168.10.20:50051 CONNECTING -> TRANSIENT_FAILURE
D 2022-01-25T23:35:32.895Z | pick_first | CONNECTING -> TRANSIENT_FAILURE
D 2022-01-25T23:35:32.895Z | resolving_load_balancer | dns:192.168.10.20:50051 CONNECTING -> TRANSIENT_FAILURE
D 2022-01-25T23:35:32.895Z | channel | (1) dns:192.168.10.20:50051 Pick result: TRANSIENT_FAILURE subchannel: undefined status: 14 No connection established
D 2022-01-25T23:35:32.895Z | call_stream | [11] cancelWithStatus code: 14 details: "No connection established"
D 2022-01-25T23:35:32.895Z | call_stream | [11] ended with status: code=14 details="No connection established"
D 2022-01-25T23:35:32.895Z | connectivity_state | (1) dns:192.168.10.20:50051 CONNECTING -> TRANSIENT_FAILURE

<-------- FINALLY GET ERROR 14!!!! --------------->

Only for reference, this is the console output when I instanciate the client class:

D 2022-01-26T03:31:29.762Z | index | Loading @grpc/grpc-js version 1.5.1
D 2022-01-26T03:31:29.976Z | resolving_load_balancer | dns:192.168.10.10:50050 IDLE -> IDLE
D 2022-01-26T03:31:29.976Z | connectivity_state | (1) dns:192.168.10.10:50050 IDLE -> IDLE
D 2022-01-26T03:31:29.977Z | dns_resolver | Resolver constructed for target dns:192.168.10.10:50050
D 2022-01-26T03:31:29.978Z | channel | (1) dns:192.168.10.10:50050 Channel constructed with options {}

Thanks for the help and this module!

Regards, Normando

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:9 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
murgatroid99commented, Jan 31, 2022

I was not able to reproduce it, but fortunately I think I found the bug anyway. I published @grpc/grpc-js version 1.5.4 with a change that I think fixes this bug. Can you try it out?

0reactions
NormandoHallcommented, Feb 1, 2022

Well, I can confirm that 1.5.4 fix this issue! I tested a lot of times with 1.5.3 and 1.5.4. Seams also that 1.5.3 fails only when I try to connect to an IP, not a FQDN. But I am not sure 100% about this. But definitely 1.5.4 fix this issue, and also the 14 error response is more fastest.

Thanks!

Read more comments on GitHub >

github_iconTop Results From Across the Web

gRPC: 14 UNAVAILABLE: failed to connect to all addresses
That gRPC error means that no server is running at the address you are trying to connect to, or a connection to that...
Read more >
Resolve issues and errors during an AKS hybrid installation
Applies to: AKS on Azure Stack HCI, AKS on Windows Server This article describes known issues and errors you may encounter when installing...
Read more >
Heroku Error Codes
Whenever your app experiences an error, Heroku will return a standard error page with the HTTP status code 503.
Read more >
NodeJS - What does socket hang up actually mean - Edureka
When a socket hang up is thrown, one of two things happens: When you're a customer, When you send a request to a...
Read more >
PuTTY Network Error: Software caused connection abort
This is a generic error produced by the Windows network code when it kills an established connection for some reason. For example, it...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found