question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Key_Shared subscription is not evenly distributed

See original GitHub issue

I tested pulsar 2.7.1 docker on my local machine. Also java client version is 2.7.1 1 producer with 4 consumer in a single app container configured below. The producer produced messages orderly with message key ranged from test-0 to test-9

private static final AtomicLong KEY_GENERATOR = new AtomicLong();

private static String randomKey() {
    var idx = KEY_GENERATOR.getAndIncrement() % 10;
    return "test-" + idx;
}

@Bean
public Producer<String> producer(PulsarClient client) throws PulsarClientException {
    return client.newProducer(Schema.STRING)
            .topic("topic-test")
            .sendTimeout(10, TimeUnit.SECONDS)
            .create();
}

@Bean
public List<Consumer<String>> consumer(PulsarClient client) throws PulsarClientException {
    var consumers = new ArrayList<Consumer<String>>(4);
    for (var i = 0; i < 4; i++) {
        var consumer = client.newConsumer(Schema.STRING)
                .topic("topic-test")
                .subscriptionName("sub-test")
                .subscriptionType(SubscriptionType.Key_Shared)
                .subscribe();
        consumers.add(consumer);
    }
    return consumers;
}

When I first run only one app container. The consumer statistics output seems not evenly distributed.

{
    "consumer-thread-2":[
        "test-2",
        "test-1",
        "test-0",
        "test-6",
        "test-4"
    ],
    "consumer-thread-3":[
        "test-9"
    ],
    "consumer-thread-0":[
        "test-3"
    ],
    "consumer-thread-1":[
        "test-8",
        "test-7",
        "test-5"
    ]
}

Then 30 seconds later, I started a new app container. Now there are two containers running at the same time.

The previous contiainer consumer statistics output changed!

{
    "consumer-thread-2":[
        "test-4"
    ],
    "consumer-thread-3":[
        "test-9"
    ],
    "consumer-thread-0":[
        "test-3"
    ],
    "consumer-thread-1":[
        "test-8",
        "test-7"
    ]
}

And the newer container consumer statistics output like this.

{
    "consumer-thread-0":[
        "test-2",
        "test-1",
        "test-0",
        "test-6"
    ],
    "consumer-thread-1":[
        "test-5"
    ]
}

Part of consumer in the newer container is even not distributed to consume any message.

Is there some best practice advices for Key_Shared subscription? I think this is a comman situation in production environment. The app containers may be deployed dynamically and caused subscription change.

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:5 (1 by maintainers)

github_iconTop GitHub Comments

1reaction
t3linkcommented, May 10, 2021

@codelipenghui Thanks! After have a look at PersistentStickyKeyDispatcherMultipleConsumers.java. I think none of the three selector strategies are suitable for my situation. Cause they all use Murmur3_32Hash to calculate a slot or range. This would like to work well when the message key set is large. I’ll consider spliting my messages to some independent topics, and then each topic can be subscribed by ConsistentHashingStickyKeyConsumerSelector. Also Thoses topics can be hashed by a simple modulo operation.

0reactions
codelipenghuicommented, May 10, 2021

@mrkingfoxx The messages for the Key_Shared subscription are dispatched by the hash of the key, by default we have [0,65535] hash slots and a consumer receives messages from a fixed hash range. So it’s can’t perform the evenly distribution. You can try to use the consistent hash for the key_shared subscription

# On KeyShared subscriptions, with default AUTO_SPLIT mode, use splitting ranges or
# consistent hashing to reassign keys to new consumers
subscriptionKeySharedUseConsistentHashing=false

# On KeyShared subscriptions, number of points in the consistent-hashing ring.
# The higher the number, the more equal the assignment of keys to consumers
subscriptionKeySharedConsistentHashingReplicaPoints=100

And you can increase the ReplicaPoints for getting a more evenly distribution.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Apache Pulsar Key Shared Mode-Sticky Consistent Hashing
The obvious disadvantage is that the hash ranges are not equally spread. In case all your keys are hashed between 16384 to 49151,...
Read more >
Scalable Stream Processing with Pulsar's Key_Shared ...
In this blog, you will learn how to use Pulsar's Key-shared subscription to perform behavioral analytics on clickstream data.
Read more >
Subscription Types in Apache Pulsar - Dattell
There are four subscription types for consumers subscribing to topics in Apache Pulsar: Exclusive, Failover, Shared, and Key_Shared.
Read more >
Messaging - Apache Pulsar
In Key_Shared type, multiple consumers can attach to the same subscription. Messages are delivered in a distribution across consumers and message with same...
Read more >
Subscriptions: Multiple Groups of Consumers on a Pulsar Topic
This consumer gets all the even-numbered messages. This makes sense since the messages will be evenly distributed between the two connected ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found