question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Kafka: allow message key and partition to be chosen independently

See original GitHub issue

If a PartitionKeyStrategy is used with a topic, the value is used as the message key, and is then implicitly used to select the partition according to the default behavior of the Kafka client:

If a valid partition number is specified that partition will be used when sending the record. If no partition is specified but a key is present a partition will be chosen using a hash of the key. If neither key nor partition is present a partition will be assigned in a round-robin fashion.

It might be desirable in some cases to control these independently. For example, you might wish to have a message key that is more fine-grained than the partition key, for use with Kafka log compaction on sub-graphs of the entity state.

Issue Analytics

  • State:open
  • Created 6 years ago
  • Comments:11 (6 by maintainers)

github_iconTop GitHub Comments

1reaction
datalchemistcommented, Nov 6, 2019

@ignasi35 Yes, my step 1 was indeed derived from @jroper suggestion. But in my step 2, I meant actually using the new property in the lagom kafka-broker internal. Anyway, I think I am on the right line.

I have looked at the details and the implementation seems quite straight forward. But then, I was just a bit puzzled with the inter-dependence between a message key and a partition number. I am not a Kafka expert at all but from what I can see in their Producer API, we can have the following cases:

  • no key and no partition number
  • one key but no partition number
  • one key and one partition number

but we can’t have a partition number without a key.

So, from that, it doesn’t seem good to have two distinct properties like PartitionKeyStrategy and PartitionNumberStrategy because then if the user defines the second without the first we have a problem. A solution I see would be to have a more general strategy encompassing both key and partition generation. e.g.:

trait MessagePartitionStrategy[-Message] {
  def computeMessageKey(message: Message): String
  def computePartitionNumber(message: Message): Option[Int]
}

Hence, it would be possible to define the message key alone, or the key and the partition.

Of course, this would mean more changes because this new type would replace the previous PartitionKeyStrategy (although they could also live together for a while, by giving precedence to the first)

Does such an approach seem suitable ? Do you have another idea to cope with this link between message key and partition ?

1reaction
jropercommented, Apr 6, 2017

The approach we’ve taken to adding properties to a topic mean that this should be straight forward to add without impacting existing APIs and functionality.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Kafka Partitioning and Message Key - Silverback
While using a single poll loop, Silverback processes the messages consumed from each Kafka partition independently and concurrently. By default up to 10 ......
Read more >
Documentation - Apache Kafka
Each partition is an ordered, immutable sequence of messages that is continually appended to—a commit log. The messages in the partitions are each...
Read more >
What should I use as the key for my Kafka message?
In Kafka, the messages are guaranteed to be processed in order only if they share the same key (and you use the default...
Read more >
Chapter 4. Kafka Consumers: Reading Data from Kafka
Moving partition ownership from one consumer to another is called a rebalance. Rebalances are important because they provide the consumer group with high ......
Read more >
Understanding Kafka Topics and Partitions - Stack Overflow
Messages in the partition have a sequential id number that uniquely identifies each message within the partition. Partitions allow a topic's log ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found