Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Publishing messages burst timeout

See original GitHub issue

Environment details

OS: Google Kubernetes Container
Node.js version: 12.15.0
npm version: -
@google-cloud/pubsub version: 2.18.3

Steps to reproduce

We’re seeing an issue in our production environment. It happens pretty inconsistently, so I’m not sure of how exactly to reproduce it.

This service publishes messages to a couple of topics consistently, and the publishing message volume is around 1 MiB per second. The errors for us come in bursts rather than consistently, and they come from a single pod at a time (we run about 150 pods on this service). For example, we’ll see a burst of ~5k errors for all of the topics coming from pod A, and the next day we’ll see that from pod B. It happens in several hours or days. Rolling out the deployment or killing the offending pod resolves the errors for at least a few hours. The errors aren’t resolved by themselves in a short time, at least aren’t within 20 minutes.

BTW, the pubsub instance is created once and reused for subsequent publishes.

The error message and stack:

Error: Total timeout of API google.pubsub.v1.Publisher exceeded 600000 milliseconds before any response was received.
    at repeat (/deploy/my-project/node_modules/google-gax/build/src/normalCalls/retries.js:66:31)
    at Timeout._onTimeout (/deploy/my-project/node_modules/google-gax/build/src/normalCalls/retries.js:101:25)
    at listOnTimeout (internal/timers.js:531:17)
    at processTimers (internal/timers.js:475:7)

Thanks! Please let me know what other information would be helpful.

Issue Analytics

State:
Created 2 years ago
Reactions:3
Comments:17 (2 by maintainers)

Top GitHub Comments

1reaction

githubwuacommented, Jan 17, 2022

This looks like a client-side issue. All pods but one are able to send requests to Pub/Sub and get a response back. Removing the bad pod is a good temporary fix. To fix this for good, we need to know what the bad pod is doing with those failed requests. Was it unable to send the requests in the first place? Or was it unable to receive the responses back. Chances are the client might have exhausted its network connections? Or, the connection to Pub/Sub endpoint might have been disconnected, but the client is unaware of it and is still using the broken connection without re-establishing a new one. How resilient is your pod in handling error conditions? Can you share a code snippet on how messages are being published in the pod, and how it handles error conditions?

0reactions

ForbesLindesaycommented, Sep 2, 2022

@feywind Where can we view that linked issue?

Top Results From Across the Web

Getting constant error "publish timed out" on PubNub

Publish Timed Out Error on PubNub Network. This is an unhelpful error which is returned from the error callback within the PubNub Android...

Troubleshooting | Cloud Pub/Sub Documentation

Publishing these messages in a loop, without any rate limiting, might create a short burst of high bandwidth over a short time period....

Issue with timeouts publishing to RabbitMQ - Google Groups

We have written an application that publishes to rabbit using the headers exchange for routing to queues (approximately 1800 queues currently) and ...

The Stuff That Every Developer Should Know About Message ...

The message queue sits between producers and consumers, making their communication and ... Handle the situation as a timeout or failure.

Using Lambda with Amazon SQS - AWS Documentation

After the visibility timeout occurs, Lambda receives the message again. To send messages to a second queue after a number of receives, configure...