question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Describe the bug We noticed that some messages were not getting processed by our consumer with a key shared subscription. We only had one consumer at that time so we checked the subscriptions via the pulsar-admin topics stats command and noticed that it was showing two consumers instead.

The application is deployed on a Kubernetes cluster so we scaled all the pods down to make sure that there could be no consumers at all but in the stats we could still see the ghost consumer.

Before scaling all the pods down:

   "cloud-spaceroom-service" : {
      "msgRateOut" : 0.0,
      "msgThroughputOut" : 0.0,
      "msgRateRedeliver" : 0.0,
      "msgBacklog" : 21,
      "blockedSubscriptionOnUnackedMsgs" : false,
      "msgDelayed" : 0,
      "unackedMessages" : 21,
      "type" : "Key_Shared",
      "msgRateExpired" : 0.0,
      "lastExpireTimestamp" : 0,
      "consumers" : [ {
        "msgRateOut" : 0.0,
        "msgThroughputOut" : 0.0,
        "msgRateRedeliver" : 0.0,
        "consumerName" : "a0869e4030",
        "availablePermits" : 0,
        "unackedMessages" : 0,
        "blockedConsumerOnUnackedMsgs" : false,
        "metadata" : { },
        "address" : "/10.1.1.76:33118",
        "connectedSince" : "2020-03-24T09:47:06.109Z"
      }, {
        "msgRateOut" : 0.0,
        "msgThroughputOut" : 0.0,
        "msgRateRedeliver" : 0.0,
        "consumerName" : "0c7db0c8c0",
        "availablePermits" : 1000,
        "unackedMessages" : 0,
        "blockedConsumerOnUnackedMsgs" : false,
        "metadata" : { },
        "address" : "/10.1.1.76:45426",
        "connectedSince" : "2020-03-24T13:53:45.761Z"
      } ],
      "isReplicated" : false
    },

After scaling all the pods down:

"cloud-spaceroom-service" : {
      "msgRateOut" : 0.0,
      "msgThroughputOut" : 0.0,
      "msgRateRedeliver" : 0.0,
      "msgBacklog" : 21,
      "blockedSubscriptionOnUnackedMsgs" : false,
      "msgDelayed" : 0,
      "unackedMessages" : 21,
      "type" : "Key_Shared",
      "msgRateExpired" : 0.0,
      "lastExpireTimestamp" : 0,
      "consumers" : [ {
        "msgRateOut" : 0.0,
        "msgThroughputOut" : 0.0,
        "msgRateRedeliver" : 0.0,
        "consumerName" : "a0869e4030",
        "availablePermits" : 0,
        "unackedMessages" : 0,
        "blockedConsumerOnUnackedMsgs" : false,
        "metadata" : { },
        "address" : "/10.1.1.76:33118",
        "connectedSince" : "2020-03-24T09:47:06.109Z"
      } ],
      "isReplicated" : false
    },

In order to get rid of the ghost consumer we had to kill the pod with the Pulsar broker. Once Kubernetes restarted the broker the stats command finally showed the connected consumers only.

To Reproduce I haven’t been able to replicate the issue yet. I do have all the logs centralized and accessible though. It would be very helpful if you could help us understand what went wrong.

Expected behavior I would have expected the consumer to disappear after having scaled down all the consumer pods, which is what happened for the connected consumer but not for the ghost one (i.e. a0869e4030 - see stats output above).

Unfortunately the bad thing is that messages were still being routed to the ghost consumer.

Additional context

Known information about the ghost consumer:

  • Consumer name: a0869e4030
  • Subscriber name: cloud-spaceroom-service
  • Subscription type: Key shared
  • Connected since: 09:47:06.109Z
  • Address: /10.1.1.76:33118
  • Topic: /public/default/UserJoinedSpace
  • Persistent topic: true

Additional information:

  • Pulsar Proxy was on (i.e. 10.1.1.76)
  • Client was using the official Golang lib with the underlying C++ library v2.5.0
  • There was only one consumer with that subscriber name at that time
  • The subscription for that topic was reported by the broker at 09:46:58.581831923Z
  • Prior to establishing the connection the client reported several times (for a period of ~6s) that it was unable to reconnect its consumer and it rescheduled lots of reconnections (from 09:46:57.788 to 09:47:03.631)
  • Around the same time (~09:47:03.558313436Z) the broker reported that it was creating 2 subscriptions for that topic and that it already had a consumer with id 18 present on the connection
    • Consumer with id 18 is already present on the connection
  • The client reported a few times (~09:47:03.616168539Z) that it couldn’t reconnect the consumer due to an unknown error
  • All logs within the Pulsar namespace go silent about the UserJoinedSpace topic at 09:47:06.367671431Z and start again at 09:51:21.172230502Z which makes for a 254805ms gap (~4.25 minutes)

More related logs:

  • Removed consumer Consumer{subscription=PersistentSubscription{topic=persistent://public/default/UserJoinedSpace, name=nodes-service}, consumerId=0, consumerName=71ea78f19e, address=/10.1.1.76:33122} with pending 0 acks
  • [/10.1.1.76:33118] Cleared consumer created after timeout on client side Consumer{subscription=PersistentSubscription{topic=persistent://public/default/UserJoinedSpace, name=cloud-spaceroom-service}, consumerId=18, consumerName=a0869e4030, address=/10.1.1.76:33118}

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:6 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
codelipenghuicommented, May 19, 2020

@fracasula I think 2.5.1 does not contain a bugfix for this issue. Looks the simple way to resolve this problem is to close the current connection and reconnect to the broker while timeout occurs. So that the broker can clean up the old consumers.

The other way is introducing a mechanism for the consumer heartbeat, which looks somewhat complicated.

1reaction
jiazhaicommented, Apr 2, 2020

@codelipenghui Thanks for the root cause, and Thanks @fracasula for the feed back. Seems we need to remove the consumer when timeout.

Read more comments on GitHub >

github_iconTop Results From Across the Web

What to Do About Ghost Customers? 5 Tips for Successful ...
5 ways to ward off ghosts · Don't let it happen in the first place · Identify the root cause · Identify early...
Read more >
What to Do About Ghost Customers? 5 Tips for Successful ...
Identify your ghosts · Determine the cause · Segment customer types · Minimize reasons for inactivity · Automate communication · Incentivize giving ...
Read more >
Ghost Customer | My Restaurant Wiki - Fandom
The Ghost Customer is a type of Special Customer in My Restaurant. If a Mystery Customer has entered the Player's Restaurant, about 50...
Read more >
Ghost Customers and their themes
Open source technology for fiercely independent publishers. Use Ghost to run a modern publication where you own the platform & control your content....
Read more >
Engagement and Success: Why Customers Ghost you
Why Customers Ghost you ... When there is a legitimate reason for a customer to interact with you – based on helping them...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found