Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Random DEADLINE_EXCEEDED errors emitted for modifyAckDeadline and acknowledge

See original GitHub issue

Environment details

OS: Ubuntu 18.04
Node.js version: 10.11.0
npm version: 6.5.0
@google-cloud/pubsub version: 0.28.1

We are processing about 300 messages per second from a subscription, and about once or twice a day we randomly get DEADLINE_EXCEEDED errors emitted like these:

Failed to "acknowledge" for 55 message(s). Reason: 4 DEADLINE_EXCEEDED: Deadline Exceeded and Failed to "modifyAckDeadline" for 63 message(s). Reason: 4 DEADLINE_EXCEEDED: Deadline Exceeded

Steps to reproduce

A minimal setup is something like this:

const subscription = new PubSub({...}).subscription(subscriptionName);
subscription.on(`message`, message => processMessage());
subscription.on(`error`, error => console.log(error));

Our processMessage function usually takes around 150ms to run, with highest peaks of 1500ms.

The acknowledgement deadline for the subscription in cloud console is set to 600 seconds.

We’ve looked through similar issues and tried experimenting with setting

      batching: {
        callOptions: {
          timeout: 600000
        }
      }

as a subscription option as described in #240.

Also tried setting the ackDeadline subscription option, but neither of them seemed to help.

We’ve also looked through the source code of this repo, but couldn’t figure out much, other than these errors come from MessageQueues for ack and nack messages, and seem to be coming through google-gax from somewhere in grpc.

Locally I can make the client emit these errors, if I set { batching: { callOptions: { timeout: 1 }}}, but in production this value is set to a much higher value.

We could just ignore these errors, but it would feel better if someone could give some tips on how to find the root cause or what could be going wrong .

Issue Analytics

State:
Created 4 years ago
Reactions:5
Comments:9 (2 by maintainers)

Top GitHub Comments

13reactions

pjm17971commented, Jun 21, 2019

@jkwlui I’m not sure this should be closed as this should ideally be handled internally. For us these errors cause the subscription on close() handler to be called after the on error() handler is called with the DEADLINE_EXCEEDED message. This puts the application in a state of not receiving messages at all. This was actually somewhat hard to detect on our end and cost a non-trivial amount of time to identify the problem (the unhandled exception was in the log messages, but it was a needle in a haystack).

In the mean time, can you advise on how to retry something after the close() unexpected exit handler is called? (or to do something in the error() handler to prevent the exit?) Currently we just exit the process and let Kubernetes recreate the pod. If there’s a cleaner way to handle this it would be helpful to know. Also, if this is known behavior, then perhaps the example code for handling subscriptions should include these cases and how to correctly respond, since I’m assume most pubsub use is long running.

4reactions

mahabencommented, Oct 3, 2019

We are facing the same issue, this error puts the app in a state of not receiving messages at all. Any workaround?

Thanks,