Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Kafka: allow message key and partition to be chosen independently

See original GitHub issue

If a PartitionKeyStrategy is used with a topic, the value is used as the message key, and is then implicitly used to select the partition according to the default behavior of the Kafka client:

If a valid partition number is specified that partition will be used when sending the record. If no partition is specified but a key is present a partition will be chosen using a hash of the key. If neither key nor partition is present a partition will be assigned in a round-robin fashion.

It might be desirable in some cases to control these independently. For example, you might wish to have a message key that is more fine-grained than the partition key, for use with Kafka log compaction on sub-graphs of the entity state.

Issue Analytics

State:
Created 6 years ago
Comments:11 (6 by maintainers)

Top GitHub Comments

1reaction

datalchemistcommented, Nov 6, 2019

@ignasi35 Yes, my step 1 was indeed derived from @jroper suggestion. But in my step 2, I meant actually using the new property in the lagom kafka-broker internal. Anyway, I think I am on the right line.

I have looked at the details and the implementation seems quite straight forward. But then, I was just a bit puzzled with the inter-dependence between a message key and a partition number. I am not a Kafka expert at all but from what I can see in their Producer API, we can have the following cases:

no key and no partition number
one key but no partition number
one key and one partition number

but we can’t have a partition number without a key.

So, from that, it doesn’t seem good to have two distinct properties like PartitionKeyStrategy and PartitionNumberStrategy because then if the user defines the second without the first we have a problem. A solution I see would be to have a more general strategy encompassing both key and partition generation. e.g.:

trait MessagePartitionStrategy[-Message] {
  def computeMessageKey(message: Message): String
  def computePartitionNumber(message: Message): Option[Int]
}

Hence, it would be possible to define the message key alone, or the key and the partition.

Of course, this would mean more changes because this new type would replace the previous PartitionKeyStrategy (although they could also live together for a while, by giving precedence to the first)

Does such an approach seem suitable ? Do you have another idea to cope with this link between message key and partition ?

1reaction

jropercommented, Apr 6, 2017

The approach we’ve taken to adding properties to a topic mean that this should be straight forward to add without impacting existing APIs and functionality.

Top Results From Across the Web

Kafka Partitioning and Message Key - Silverback

While using a single poll loop, Silverback processes the messages consumed from each Kafka partition independently and concurrently. By default up to 10 ......

Documentation - Apache Kafka

Each partition is an ordered, immutable sequence of messages that is continually appended to—a commit log. The messages in the partitions are each...

What should I use as the key for my Kafka message?

In Kafka, the messages are guaranteed to be processed in order only if they share the same key (and you use the default...

Chapter 4. Kafka Consumers: Reading Data from Kafka

Moving partition ownership from one consumer to another is called a rebalance. Rebalances are important because they provide the consumer group with high ......

Understanding Kafka Topics and Partitions - Stack Overflow

Messages in the partition have a sequential id number that uniquely identifies each message within the partition. Partitions allow a topic's log ...