Catastrophic frequent random subscription freezes, especially on high-traffic topics.
See original GitHub issueDescribe the bug Topics randomly freeze, causing catastrophic topic outages on a weekly (or more frequent) basis. This has been an issue as long as my team has used Pulsar, and it’s been communicated to a number of folks on the Pulsar PMC committee.
(I thought an issue was already created for this bug, but I couldn’t find it anywhere.)
To Reproduce We have not figured out how to reproduce the issue. It’s random (seems to be non-deterministic) and doesn’t seem to have any clues in the broker logs.
Expected behavior Topics should never just randomly stop working to where the only resolution is restarting the problem broker.
Steps to Diagnose and Temporarily Resolve Step 2: Check the rate out on the topic. (click on the topic in the dashboard, or do a stats on the topic and look at the “msgRateOut”)
If the rate out is 0 this is likely a frozen topic, but to verify do the following:
In the pulsar dashboard, click on the broker that topic is living on. If you see that there are multiple topic that have a rate out of 0, then proceed to the next step, if not it could potentially be another issue. Investigate further.
Step 3: Stop the broker on the server that the topic is living on. pulsar-broker stop
.
Step 4: Wait for the backlog to be consumed and all the functions to be rescheduled. (typically wait for about 5-10 mins)
Environment:
Docker on bare metal running: `apachepulsar/pulsar-all:2.4.0`
on CentOS.
Brokers are the function workers.
This has been an issue with previous versions of Pulsar as well.
Additional context
Problem was MUCH worse with Pulsar 2.4.2, so our team needed to roll back to 2.4.0 (which has the problem, but it’s less frequent). This is preventing the team from progressing in the use of Pulsar, and it’s causing SLA problems with those who use our service.
Issue Analytics
- State:
- Created 4 years ago
- Comments:94 (83 by maintainers)
Top GitHub Comments
This bug has been resolved in DataStax Luna Streaming 2.7.2_1.1.21
We (StreamNative) have been helping folks from Tencent at developing a feature called ReadOnly brokers. It allows a topic can have multiple owners (one writeable owner and multiple readonly owners). It has been running on production for a while. They will contribute it back soon.