question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

No great pattern for doing a graceful shutdown with Apollo Server integration packages

See original GitHub issue

Hello,

After updating to 2.22 (see PR #4981) and applying the recommendation in changelog to insert await server.start() between server = new ApolloServer() and server.applyMiddleware we started observing that Apollo is now listetning to termination signals and stops handling in-flight requests by throwing:

{"errors": [{
  "message": "Cannot execute GraphQL operations after the server has stopped.",
  "extensions": {"code":"INTERNAL_SERVER_ERROR"}
}]}

We were already handling these signals and calling the express close method (which does not abort in-flight requests but rather stops accepting new ones and waits for the others to finish).

My impression was that when using some middleware, like express, rather than the standalone apollo server these signals should not be handled by apollo itself? At least they were not prior to 2.22.x.

To work around this issue we explicitely set stopOnTerminationSignals: false and it seems to have resolved it.

Some context: We are deploying to a K8S deployment which does a rolling update. After the new version has started k8s sends a termination signal to the old version. Upon receiving this signal we make the readiness probe fail to avoid new requests being routed but keep express up for some more time until the in-flight requests are finished (or a timeout is triggered).

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Reactions:2
  • Comments:12 (9 by maintainers)

github_iconTop GitHub Comments

1reaction
kevin-lindsay-1commented, Aug 9, 2021

I would like to throw this out there just so that it’s acknowledged: right now it looks like stop also shuts down the health check endpoint, which can then cause readinessProbes to fail in kubernetes, which then prevents the request from completing.

I haven’t looked into the code, so maybe I’m missing something, but it seems like that’s the case, and if so it would be good to keep in mind that in kubernetes the health check should stay active until the pod is ready to be removed. At least, that’s what I intuit, as once a pod enters terminating state it’s generally considered finished cleaning up once it stops being ready.

Something else could be going on; I’ll follow up after I figure out what’s up with this.


Edit: Upon further examination, the issue was caused by Istio needing a pod annotation of:

# https://istio.io/latest/docs/reference/config/istio.mesh.v1alpha1/#ProxyConfig
proxy.istio.io/config: |
  terminationDrainDuration: {{ $terminationGracePeriodSeconds }}s
1reaction
glassercommented, Jul 16, 2021

(starting on the inline-and-improve-stoppable project at https://github.com/apollographql/apollo-server/pull/5498 )

Read more comments on GitHub >

github_iconTop Results From Across the Web

API Reference: Drain HTTP server plugin - Apollo GraphQL
We highly recommend using this plugin to ensure your server shuts down gracefully. You do not need to use this plugin with the...
Read more >
Error handling - Apollo GraphQL Docs
If Apollo Server hasn't correctly started up or is in the process of shutting down, it responds with a 500 status code. The...
Read more >
API Reference: ApolloServer - Apollo GraphQL Docs
A key-value cache that Apollo Server uses to store previously encountered GraphQL operations (as DocumentNode s). It does not store query results.
Read more >
Choosing an Apollo Server package
Apollo Server is distributed as a collection of different packages for different environments and web frameworks. You can choose which package to use...
Read more >
Apollo Server plugin event reference - Apollo GraphQL Docs
When your serverWillStop handler is called, Apollo Server is in a state where it will no longer start to execute new GraphQL operations,...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found