question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Messages lost with new topic and regex subscription

See original GitHub issue

Describe the bug When a new topic is detected by a regexp subscription it takes time before the subscriptions cursor is set up for that topic. As the cursor is set to the end of the topic this means at least one message is lost and as this can take 40 seconds, one could lose 40 seconds of data.

To Reproduce

If I set up a consumer with a regex subscription, for example:

/opt/pulsar/bin/pulsar-client consume --regex '.*' -s all -n 0

I then send a message on a NEW topic that matches the regex.

/opt/pulsar//bin/pulsar-client produce addtopic -m 'm1'

The consumer detects the new topic and sets up a subscription to it. This can take 30-40 seconds. However it does not see the message (or any other messages sent befor the subscription is set up)

Once it is set up, sending more data to the topic will be picked up by the consumer.

/opt/pulsar//bin/pulsar-client produce addtopic -m 'm2'

The consumer will display the message ‘m2’.

So though it works from now on, potentially the first 40 seconds of data have been lost.

Expected behavior All messages sent to the new topic should be seen by the consumer.

Screenshots N/A

Desktop (please complete the following information): Centos 7 Pulsar 2.5.0, 2.5.1

Additional context

The initial message(s) are on the topic, one can see them with a reader. So a solution would be for the cursor for the new topic subscription be created pointing to the start of the topic rather then the normal end in this case.

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:7 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
BewareMyPowercommented, Jun 11, 2020

The same applies to a partitioned consumer. IMO, when a consumer found new topics/partitions, the subscription initial position should be changed to earliest no matter what the original initial position is.

Usually consumers use latest initial position to discard outdated messages. However, assuming that partitions were dynamic increased, i.e. there’re some producers and consumers serving this partitioned topic currently. If producers found the increased partitions before consumers, in consumer’s view, those messages before it consumes shouldn’t be considered outdated.

What do you think of this change? @sijie

0reactions
vitosanscommented, Sep 12, 2022

@sijie - What is the official position on this? Is it suggested to use earliest? I see it has gone stale and has not been updated for two years. We are running into this issue, which is counter-intuitive to how a queue should work.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Messaging - Apache Pulsar
The retry letter topic allows you to store the messages that failed to be consumed and retry consuming them later. With this method,...
Read more >
Chapter 4. Kafka Consumers: Reading Data from Kafka
When multiple consumers are subscribed to a topic and belong to the same consumer group, each consumer in the group will receive messages...
Read more >
Filter messages from a subscription | Cloud Pub/Sub
In the Google Cloud console, go to the Subscriptions page. · Click Create subscription. · Enter the Subscription ID. · Choose or create...
Read more >
Google Pubsub Subscription based on attributes or Message ...
Update June 2020: Filtering is now an available feature in Google Cloud Pub/Sub. When creating a subscription, one can specify a filter that ......
Read more >
Regular expressions that are used in transport rules
An incorrectly configured regular expression could yield unexpected matches and cause unwanted transport rule behavior. These implications may ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found