question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Random timeout/deadline exceeded unhandled rejections

See original GitHub issue

Environment details

  • OS: Cloud Functions environment (Ubuntu 18.04)
  • Node.js version: NodeJS 16
  • npm version: Not sure, the one used by Cloud Functions
  • @google-cloud/spanner version: 6.1.2, but the issue has been happening since versions 5.x

Steps to reproduce

Run a Cloud Function using the Spanner client with an opened database, even without making requests.

Context

I’ve been using the NodeJS Spanner client for several projects for about a year now, and I’ve consistently noticed DEADLINE_EXCEEDED and other related errors not directly linked to my requests, usually when the client hasn’t been used for several minutes. What I mean by that is that those appear as unhandled rejections in NodeJS. Cloud Functions makes this more visible as unhandled rejections make the functions framework crash and log the error. This issue occurs in many of my Cloud Functions performing various unrelated tasks using Spanner. Again, business code is never impacted as the unhandled rejections usually occur when the Cloud Function has been inactive for several minutes (or even several hours). This also means that this bug is much more visible when the minimum number of instances is set to >= 1, as the Cloud Function instances stay idle longer (or even indefinitely). I see no reason why this wouldn’t happen in other environments, but I’ve mainly been using Cloud Functions lately and can only testify for the behaviour in this context. All Cloud Functions show fairly low activity (most of the time they are stuck at min_instances: 1 and sometimes reach 5 instances). The Cloud Spanner instance is running with 200 processing units and is underutilised (it never goes above 10% CPU).

The problem here is really about the rejections not being caught by the client. I’m not really worried about the errors themselves as they never seem to occur when running business operations/transactions (or at least they are being caught and properly retried in this case).

My only lead would be the keepAlive mechanism as it uses setInterval and periodically makes requests to Spanner, especially when sessions aren’t being used. However after looking at the code I haven’t seen anything obvious.

Stack traces

Unfortunately as those are unhandled rejections the stack traces are usually not very helpful, but one of them clearly references the Spanner client. To give you a rough idea of the rate of occurence of the issue, it has happened 33 times in 10 days for around 100 Cloud Functions, most of which have min_instances: 1 set. Out of the 33 errors, 4 are due to CANCELLED, 3 are Total timeout of API..., and the rest are DEADLINE_EXCEEDED.

Unhandled rejection
Error: Total timeout of API google.spanner.v1.Spanner exceeded 60000 milliseconds before any response was received.
    at repeat (/workspace/node_modules/google-gax/build/src/normalCalls/retries.js:66:31)
    at Timeout._onTimeout (/workspace/node_modules/google-gax/build/src/normalCalls/retries.js:101:25)
    at listOnTimeout (node:internal/timers:559:17)
    at processTimers (node:internal/timers:502:7)
Error: Process exited with code 16
    at process.<anonymous> (/workspace/node_modules/@google-cloud/functions-framework/build/src/invoker.js:92:22)
    at process.emit (node:events:527:28)
    at process.emit (node:domain:475:12)
    at process.exit (node:internal/process/per_thread:189:15)
    at sendCrashResponse (/workspace/node_modules/@google-cloud/functions-framework/build/src/logger.js:44:9)
    at process.<anonymous> (/workspace/node_modules/@google-cloud/functions-framework/build/src/invoker.js:88:44)
    at process.emit (node:events:527:28)
    at process.emit (node:domain:475:12)
    at emit (node:internal/process/promises:140:20)
    at processPromiseRejections (node:internal/process/promises:274:27)
Unhandled rejection
Error: 4 DEADLINE_EXCEEDED: Deadline exceeded
    at Object.callErrorFromStatus (/workspace/node_modules/@grpc/grpc-js/build/src/call.js:31:26)
    at Object.onReceiveStatus (/workspace/node_modules/@grpc/grpc-js/build/src/client.js:189:52)
    at /workspace/node_modules/@grpc/grpc-js/build/src/call-stream.js:111:35
    at Object.onReceiveStatus (/workspace/node_modules/grpc-gcp/build/src/index.js:73:29)
    at InterceptingListenerImpl.onReceiveStatus (/workspace/node_modules/@grpc/grpc-js/build/src/call-stream.js:106:23)
    at Object.onReceiveStatus (/workspace/node_modules/@grpc/grpc-js/build/src/client-interceptors.js:365:141)
    at Object.onReceiveStatus (/workspace/node_modules/@grpc/grpc-js/build/src/client-interceptors.js:328:181)
    at /workspace/node_modules/@grpc/grpc-js/build/src/call-stream.js:187:78
    at processTicksAndRejections (node:internal/process/task_queues:78:11)
    at runNextTicks (node:internal/process/task_queues:65:3)
// Same Error: Process exited with code 16 coming from the functions-framework.
Unhandled rejection
Error: 1 CANCELLED: Call cancelled
    at Object.callErrorFromStatus (/workspace/node_modules/@grpc/grpc-js/build/src/call.js:31:26)
    at Object.onReceiveStatus (/workspace/node_modules/@grpc/grpc-js/build/src/client.js:189:52)
    at /workspace/node_modules/@grpc/grpc-js/build/src/call-stream.js:111:35
    at Object.onReceiveStatus (/workspace/node_modules/grpc-gcp/build/src/index.js:73:29)
    at InterceptingListenerImpl.onReceiveStatus (/workspace/node_modules/@grpc/grpc-js/build/src/call-stream.js:106:23)
    at Object.onReceiveStatus (/workspace/node_modules/@grpc/grpc-js/build/src/client-interceptors.js:365:141)
    at Object.onReceiveStatus (/workspace/node_modules/@grpc/grpc-js/build/src/client-interceptors.js:328:181)
    at /workspace/node_modules/@grpc/grpc-js/build/src/call-stream.js:187:78
    at processTicksAndRejections (node:internal/process/task_queues:78:11)
// Same Error: Process exited with code 16 coming from the functions-framework.

Issue Analytics

  • State:closed
  • Created a year ago
  • Reactions:2
  • Comments:24 (9 by maintainers)

github_iconTop GitHub Comments

1reaction
asthamohtacommented, Jul 21, 2022

This makes sense. Give me some time to figure out which part of the code the error should get caught in and I will add code to catch this. I am going to be on vacation for a week but I will look into this when I get back.

0reactions
flovouincommented, Sep 4, 2022

@dannnnthemannnn To be honest I wouldn’t call these parameters production ready. It just consists in setting a parameter that wasn’t made for this to an extremely high value. Setting it too high might even have some side effects.

@asthamohta Unfortunately the error has occurred again after a few days. It’s not as frequent as before, but it still occurs. Also, because the errors are still unhandled, they still make noise in Error Reporting / Cloud Logging / our alerting system.

The diagnosis (about Cloud Functions killing background activity) makes sense, and I don’t really mind that the keep alive mechanism cannot work in this case. But I’d suggest fixing the error handling like we discussed, because that’s unfortunately something we cannot work around outside the Spanner client itself. Also, although I’m aware this is more work, it might be useful to have a proper parameter for disabling the keep alive mechanism when it’s expected not to work. Currently it seems that working around it with the existing parameters does not work 100% of the time, and it makes it hard to know whether there is actually something more to look into or if it’s just that the existing parameters are not made for this.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Random DEADLINE_EXCEEDED errors emitted for ... - GitHub
I'm getting this errors: Failed to "modifyAckDeadline" for 33 message(s). Reason: 4 DEADLINE_EXCEEDED: Deadline exceeded and Failed to " ...
Read more >
Timeout and unhandled promise rejection - node.js
The following error message: UnhandledPromiseRejectionWarning: MongooseError: Operation `users.find()` buffering timed out after 10000ms.
Read more >
All configuration options - Quarkus
AWS Lambda Type Default AWS Lambda Common Type Default AWS Lambda Gateway REST API Type Default Agroal ‑ Database connection pool Type Default
Read more >
Configuration - Documentation - Akka
Akka is a toolkit for building highly concurrent, distributed, and resilient message-driven applications for Java and Scala.
Read more >
Search OpenShift CI - OKD
#1954848 bug 19 months ago #1927941 bug 16 months ago #1796904 bug 2 years ago #1805019 bug 2 years ago
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found