question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Sporadic cur != GRPC_CHANNEL_SHUTDOWN crashes the node process

See original GitHub issue

Problem description

The company I work in has multiple services and uses node grpc implementation to communicate amongst them. We are currently experiencing sporadically E0714 23:39:13.907064150 18 connectivity_state.cc:154] assertion failed: cur != GRPC_CHANNEL_SHUTDOWN errors that shut down the service completely. The GRPC_CHANNEL_SHUTDOWN errors are usually preceded ( not always ) by other errors in the service such as Postgres statement timeouts and elastic search errors. However, these are not unhandled errors and it should definitely not shut down the service. It also looks to be happening with one of the services that usually handle more data compared to the others.

Reproduction steps

You can use the code in the repo here to reproduce the problem. It also describes the steps to reproduce.

Environment

  • OS name, version and architecture: Alpine Linux v3.11 docker running on Amazon Linux 2 x86_64
  • Node version: v12.16.0
  • Node installation method: docker
  • Package name and version: “grpc”: “1.22.2” and “@grpc/grpc-js”: “0.7.0”

Additional context

  • We are using Workers but the grpc server is created in the main thread.

  • Server Options:

{
        'grpc.max_send_message_length': 104857600,
        'grpc.max_receive_message_length': 104857600,
        'grpc.max_connection_idle_ms': 15000,
        'grpc.max_connection_age_ms': 30000,
        'grpc.keepalive_time_ms': 5000,
        'grpc.keepalive_timeout_ms': 1000,
        'grpc.keepalive_permit_without_calls': 1,
        // Allow grpc pings from client without data. 
        // It must be 0 with Workers otherwise it throws RESOURCE_EXHAUSTED.
        'grpc.http2.min_ping_interval_without_data_ms': 0
    }
  • Client Options:
{
        'grpc.max_send_message_length': 104857600,
        'grpc.max_receive_message_length': 104857600,
        'grpc.keepalive_time_ms': 5000,
        'grpc.keepalive_timeout_ms': 1000,
        'grpc.keepalive_permit_without_calls': 1
    }
  • Error:
Jul 15 09:10:37 ca-cqoexb data-service-be460c54df5d E0715 13:10:37.418067384      17 connectivity_state.cc:154]  assertion failed: cur != GRPC_CHANNEL_SHUTDOWN
Jul 15 09:10:37 ca-cqoexb data-service-be460c54df5d Aborted

[UPDATE]

I found that grpc core removed the assertion from connectivity_state.cc:154 that is causing the issue. However, it has been released in version 1.25. Here is the link to the PR that removed the line and the explanation of the changes. Is there any ETA to release version 1.25 of grpc-node and if not what is the best option for us supposing it will fix the problem?

[UPDATE 2] I was able to reproduce and update the Reproduction steps section. Please, note that removing max-age will cause RST_STREAMs if working with NLB.

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Reactions:4
  • Comments:10 (6 by maintainers)

github_iconTop GitHub Comments

1reaction
murgatroid99commented, Jul 15, 2020

The list of supported options can be found here: https://github.com/grpc/grpc-node/blob/master/PACKAGE-COMPARISON.md.

0reactions
murgatroid99commented, May 7, 2021

The grpc package is now deprecated, so this change will likely not happen. We recommend switching to @grpc/grpc-js.

Read more comments on GitHub >

github_iconTop Results From Across the Web

crash during process shutdown grpc-0.12, c++ - Google Groups
I'm seeing occasional crashes of the below form: pure virtual method called. terminate called without an active exception. *** Aborted at 1453786187 (unix ......
Read more >
Avoiding the Top 10 NGINX Configuration Mistakes
We help you avoid the 10 most common NGINX configuration errors, explaining the problems caused by each and how to fix them.
Read more >
Troubleshoot Common Consul Issues - HashiCorp Developer
We like to use the following process: Gather data; Verify what works; Solve one problem at a time; Form a hypothesis and test...
Read more >
Diff - 8f1a214^! - platform/external/curl - Git at Google
We only shutdown session if that value is 0. With this commit, when stream was closed before reading response header fields, error code...
Read more >
Docker Desktop for Windows Edge Release notes
When WSL integration process unexpectedly stops, the user is now notified and can decide to restart it or not, instead of always try...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found