question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Pubsub messages always expire after 30 minutes

See original GitHub issue

We have the following setup for processing pubsub messages (some of which can require a large amount of processing - up to 2 or 3 hours!):

  • A pubsub subscription with an acknowledgement deadline of 600s/10 minutes-
  • A maximum acknowledgement deadline (including extensions) of 8 hours, set using setMaxAckExtensionPeriod on the Subscriber.Builder
  • A policy to renew the period of the message 5 minutes before expiry, set using setAckExpirationPadding on the Subscriber.Builder

Under these circumstances, I would expect:

  • The subscriber to accept the message from the queue, with an initial deadline of 10 minutes.
  • The subscriber to renew the acknowledgement deadline for another 10 minutes every 5 minutes, until:
  1. the process function acks/nacks the message
  2. 10 minutes after the process function fails, and the expiry deadline (no longer being renewed) expires
  3. after 8 hours of extensions, the max deadline is reached and the deadline can no longer be extended

What we actually see - the message goes back on the queue after 30 minutes. Tried processing a long-running message three times, every single time the message gets picked back up off the queue by a worker 30 minutes later. I just can’t understand why this would be the case. We never mention 30 minutes in our settings, and I can’t see 30 minutes in the defaults anywhere (for example the default max extension period is 60 minutes).

It’s entirely possible I’ve completely misunderstood the way the acknowledgement deadline renewal is supposed to work and am barking up the wrong tree entirely, if so deepest apologies, but I’m at my wits’ end trying to understand what’s going on here!

Issue Analytics

  • State:closed
  • Created 6 years ago
  • Comments:23 (9 by maintainers)

github_iconTop GitHub Comments

1reaction
kir-titievskycommented, Jan 9, 2018

So, I think the root of the issue is that streamingPull connections get closed after something like 30 seconds of inactivity and can’t survive for longer than 30 minutes. Re-building the connections will lead to elevated duplicate message delivery. The fixes to mitigate that server-side are rolling out in the next couple weeks.

This has two possible implications for the client library:

  • If we had exposed the parameter that controls the maximum delay between modAcks being sent to the users, we could have had a very simple mitigation for the above condition (e.g. send mod acks no less frequently than every 25 seconds, rather than the 99th ptile).
  • There is a case for keeping pull based implementation around, particularly, if it’s still in the code. Having that as a user configurable fall back might be an easy workaround for tasks that require more than 30 minutes.
1reaction
pongadcommented, Nov 14, 2017

@hairybreeches You seem to understand most things correctly (see 1a below). There are 2 known issues that might explain this behavior.

  1. The latest release has a bug where if you set the padding to >= 2 seconds, it can cause us to try to send many modify deadline requests quickly. Modifying so often could cause the pubsub server to misbehave. This was fixed by #2604.

One caveat: The linked PR removed the ability for you to set padding time. It is currently set to 5 seconds, which should be more than enough for the request to reach the server. The padding config led to quite a few bugs in the past. In my opinion, it is better to make sure the preset value works well. Please let us know if you think this should work for you.

1a. You said in the issue that you set the deadline 10 minutes. This configures the deadline for messages retrieved using the old “polling pull” endpoint. Pubsub team has asked us to move to “streaming pull”. In this new endpoint, the deadline is configured per-connection. Theoretically, you can have two machines pulling messages from the same topic, but one of them sets deadline to 1m and the other to 10m. Currently the deadline is set to 1m (with 5s padding, we extend every 55s). Later, we will make the client lib records how long you take to process messages and adjust accordingly. After this, the lib should automatically optimize the deadline without you having to configure things. In the meantime, this will slightly increase your network usage (we send modify deadlines more often than ideal) but this shouldn’t greatly matter: the modify deadline messages are quite small and you don’t get billed for it.

  1. The pubsub service has a duplication problem on the server tracked in #2465. The fix for this is being rolled out this week.

Compared to other issues reported, your workload is much longer-running, so it could be running into an undiagnosed problem. Let’s see if the fix for 1 and 2 help first, but I’ll keep this issue open for now.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Pub/Sub quotas and limits - Google Cloud
There is no limit on the number of retained messages. If subscribers don't use a subscription, the subscription expires. The default expiration period...
Read more >
Things I wish I knew about Google Cloud Pub/Sub: Part 2
Messages expire after a certain time to allow messages to be redelivered to healthy subscribers if a single subscriber fails. You can limit...
Read more >
PubSub: How to set a retry policy without exponential backoff?
By default pubsub will try to send the message until it is unacked as the design of Pub/Sub is to implement At least...
Read more >
Configure Time to leave(Expiry) of a PubsubMessage
You received this message because you are subscribed to the Google Groups "Google Cloud Pub/Sub Discussions" group. To unsubscribe from this group and...
Read more >
pubsub - Go Packages
More information about Google Cloud Pub/Sub is available at ... For use cases where message processing exceeds 30 minutes, we recommend using the...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found