question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Publishing messages burst timeout

See original GitHub issue

Environment details

  • OS: Google Kubernetes Container
  • Node.js version: 12.15.0
  • npm version: -
  • @google-cloud/pubsub version: 2.18.3

Steps to reproduce

  1. ?
  2. ?

We’re seeing an issue in our production environment. It happens pretty inconsistently, so I’m not sure of how exactly to reproduce it.

This service publishes messages to a couple of topics consistently, and the publishing message volume is around 1 MiB per second. The errors for us come in bursts rather than consistently, and they come from a single pod at a time (we run about 150 pods on this service). For example, we’ll see a burst of ~5k errors for all of the topics coming from pod A, and the next day we’ll see that from pod B. It happens in several hours or days. Rolling out the deployment or killing the offending pod resolves the errors for at least a few hours. The errors aren’t resolved by themselves in a short time, at least aren’t within 20 minutes.

BTW, the pubsub instance is created once and reused for subsequent publishes.

The error message and stack:

Error: Total timeout of API google.pubsub.v1.Publisher exceeded 600000 milliseconds before any response was received.
    at repeat (/deploy/my-project/node_modules/google-gax/build/src/normalCalls/retries.js:66:31)
    at Timeout._onTimeout (/deploy/my-project/node_modules/google-gax/build/src/normalCalls/retries.js:101:25)
    at listOnTimeout (internal/timers.js:531:17)
    at processTimers (internal/timers.js:475:7)

Thanks! Please let me know what other information would be helpful.

Issue Analytics

  • State:open
  • Created 2 years ago
  • Reactions:3
  • Comments:17 (2 by maintainers)

github_iconTop GitHub Comments

1reaction
githubwuacommented, Jan 17, 2022

This looks like a client-side issue. All pods but one are able to send requests to Pub/Sub and get a response back. Removing the bad pod is a good temporary fix. To fix this for good, we need to know what the bad pod is doing with those failed requests. Was it unable to send the requests in the first place? Or was it unable to receive the responses back. Chances are the client might have exhausted its network connections? Or, the connection to Pub/Sub endpoint might have been disconnected, but the client is unaware of it and is still using the broken connection without re-establishing a new one. How resilient is your pod in handling error conditions? Can you share a code snippet on how messages are being published in the pod, and how it handles error conditions?

0reactions
ForbesLindesaycommented, Sep 2, 2022

@feywind Where can we view that linked issue?

Read more comments on GitHub >

github_iconTop Results From Across the Web

Getting constant error "publish timed out" on PubNub
Publish Timed Out Error on PubNub Network. This is an unhelpful error which is returned from the error callback within the PubNub Android...
Read more >
Troubleshooting | Cloud Pub/Sub Documentation
Publishing these messages in a loop, without any rate limiting, might create a short burst of high bandwidth over a short time period....
Read more >
Issue with timeouts publishing to RabbitMQ - Google Groups
We have written an application that publishes to rabbit using the headers exchange for routing to queues (approximately 1800 queues currently) and ...
Read more >
The Stuff That Every Developer Should Know About Message ...
The message queue sits between producers and consumers, making their communication and ... Handle the situation as a timeout or failure.
Read more >
Using Lambda with Amazon SQS - AWS Documentation
After the visibility timeout occurs, Lambda receives the message again. To send messages to a second queue after a number of receives, configure...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found