Random DEADLINE_EXCEEDED errors emitted for modifyAckDeadline and acknowledge
See original GitHub issueEnvironment details
- OS: Ubuntu 18.04
- Node.js version: 10.11.0
- npm version: 6.5.0
@google-cloud/pubsub
version: 0.28.1
We are processing about 300 messages per second from a subscription, and about once or twice a day we randomly get DEADLINE_EXCEEDED errors emitted like these:
Failed to "acknowledge" for 55 message(s). Reason: 4 DEADLINE_EXCEEDED: Deadline Exceeded
and
Failed to "modifyAckDeadline" for 63 message(s). Reason: 4 DEADLINE_EXCEEDED: Deadline Exceeded
Steps to reproduce
A minimal setup is something like this:
const subscription = new PubSub({...}).subscription(subscriptionName);
subscription.on(`message`, message => processMessage());
subscription.on(`error`, error => console.log(error));
Our processMessage function usually takes around 150ms to run, with highest peaks of 1500ms.
The acknowledgement deadline for the subscription in cloud console is set to 600 seconds.
We’ve looked through similar issues and tried experimenting with setting
batching: {
callOptions: {
timeout: 600000
}
}
as a subscription option as described in #240.
Also tried setting the ackDeadline
subscription option, but neither of them seemed to help.
We’ve also looked through the source code of this repo, but couldn’t figure out much, other than these errors come from MessageQueues for ack and nack messages, and seem to be coming through google-gax
from somewhere in grpc
.
Locally I can make the client emit these errors, if I set { batching: { callOptions: { timeout: 1 }}}
, but in production this value is set to a much higher value.
We could just ignore these errors, but it would feel better if someone could give some tips on how to find the root cause or what could be going wrong .
Issue Analytics
- State:
- Created 4 years ago
- Reactions:5
- Comments:9 (2 by maintainers)
Top GitHub Comments
@jkwlui I’m not sure this should be closed as this should ideally be handled internally. For us these errors cause the subscription on
close()
handler to be called after the onerror()
handler is called with theDEADLINE_EXCEEDED
message. This puts the application in a state of not receiving messages at all. This was actually somewhat hard to detect on our end and cost a non-trivial amount of time to identify the problem (the unhandled exception was in the log messages, but it was a needle in a haystack).In the mean time, can you advise on how to retry something after the
close()
unexpected exit handler is called? (or to do something in theerror()
handler to prevent the exit?) Currently we just exit the process and let Kubernetes recreate the pod. If there’s a cleaner way to handle this it would be helpful to know. Also, if this is known behavior, then perhaps the example code for handling subscriptions should include these cases and how to correctly respond, since I’m assume most pubsub use is long running.We are facing the same issue, this error puts the app in a state of not receiving messages at all. Any workaround?
Thanks,