question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Catastrophic frequent random subscription freezes, especially on high-traffic topics.

See original GitHub issue

Describe the bug Topics randomly freeze, causing catastrophic topic outages on a weekly (or more frequent) basis. This has been an issue as long as my team has used Pulsar, and it’s been communicated to a number of folks on the Pulsar PMC committee.

(I thought an issue was already created for this bug, but I couldn’t find it anywhere.)

To Reproduce We have not figured out how to reproduce the issue. It’s random (seems to be non-deterministic) and doesn’t seem to have any clues in the broker logs.

Expected behavior Topics should never just randomly stop working to where the only resolution is restarting the problem broker.

Steps to Diagnose and Temporarily Resolve image Step 2: Check the rate out on the topic. (click on the topic in the dashboard, or do a stats on the topic and look at the “msgRateOut”)

If the rate out is 0 this is likely a frozen topic, but to verify do the following:

In the pulsar dashboard, click on the broker that topic is living on. If you see that there are multiple topic that have a rate out of 0, then proceed to the next step, if not it could potentially be another issue. Investigate further. image

image

Step 3: Stop the broker on the server that the topic is living on. pulsar-broker stop .

Step 4: Wait for the backlog to be consumed and all the functions to be rescheduled. (typically wait for about 5-10 mins)

Environment:

Docker on bare metal running: `apachepulsar/pulsar-all:2.4.0`
on CentOS.
Brokers are the function workers. 

This has been an issue with previous versions of Pulsar as well.

Additional context

Problem was MUCH worse with Pulsar 2.4.2, so our team needed to roll back to 2.4.0 (which has the problem, but it’s less frequent). This is preventing the team from progressing in the use of Pulsar, and it’s causing SLA problems with those who use our service.

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:94 (83 by maintainers)

github_iconTop GitHub Comments

2reactions
devinbostcommented, Dec 17, 2021

This bug has been resolved in DataStax Luna Streaming 2.7.2_1.1.21

2reactions
sijiecommented, Jan 21, 2020

I noticed that each topic lives on a single broker, which creates a single point of failure. Is there any interest in making topics higher availability?

We (StreamNative) have been helping folks from Tencent at developing a feature called ReadOnly brokers. It allows a topic can have multiple owners (one writeable owner and multiple readonly owners). It has been running on production for a while. They will contribute it back soon.

Read more comments on GitHub >

github_iconTop Results From Across the Web

[GitHub] [pulsar] devinbost commented on issue #6054 ...
[GitHub] [pulsar] devinbost commented on issue #6054: Catastrophic frequent random subscription freezes, especially on high-traffic topics.
Read more >
org.apache.pulsar.commits - 2020 January - 2,140 messages ...
[GitHub] [pulsar] sijie closed issue #6021: reader's subscription not ... Catastrophic frequent random topic freezes, especially on high-traffic topics.
Read more >
XPS 15 9500 randomly freezing (known issue) - Dell Community
Solved: Hey guys, I got my xps 15 couple days ago and I have been having a freezing issue that happens in many...
Read more >
Preservation Approaches for High-Traffic-Volume Roadways
However, the practice of preservation on high-traffic-volume roadways is not nearly as common as it is on lower-traffic-volume roadways. The following are.
Read more >
How to Avoid Website Crashes Due to a High Traffic
Website crashes are a catastrophe for ecommerce stores. Imagine a physical shopfront having its front doors randomly open and close, ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found