Add manual assignment of TopicPartitions to Consumer
See original GitHub issueIn using Kafka for stream processing it’s common practice to have to manually assign topic-partitions for consumption, rather than subscribing and receiving an assignment through the group.
For example, when processing messages from a partition the consumer subscribes to, you might have to keep some state. To make sure this state is durable, it’s often replicated to a changelog, also implemented by a Kafka topic. That way, when the consumer is designed another partition or crashes, the state can be restored. To make sure the right processing state is restored before processing continues, the changelog is written to a topic with the same partition for which we’re processing, a process called copartitioning. By simply consuming the state-replication-topic from the same partition for which you’re processing messages, you’re guaranteed to restore the right state.
In the above example, subscribing to a ConsumerGroup won’t work: you’ve already been assigned a partition for the input topic and you need exactly that same partition number. There’s other examples, too, like replicating a changelog as a local Cache on every node, ready to be queried through HTTP. In that case, you want to consume all partitions, rather than just be assigned a couple.
The documentation of the official Java client describes it as well:
In the previous examples, we subscribed to the topics we were interested in and let Kafka dynamically assign a fair share of the partitions for those topics based on the active consumers in the group. However, in some cases you may need finer control over the specific partitions that are assigned. For example:
- If the process is maintaining some kind of local state associated with that partition (like a local on-disk key-value store), then it should only get records for the partition it is maintaining on disk.
- If the process itself is highly available and will be restarted if it fails (perhaps using a cluster management framework like YARN, Mesos, or AWS facilities, or as part of a stream processing framework). In this case there is no need for Kafka to detect the failure and reassign the partition since the consuming process will be restarted on another machine.
Proposed solution
Like the Java KafkaConsumer
does, allow to call consumer.assign
instead of consumer.subscribe
with an array of topic partitions.
To make this practical, implementing consumer.assignments()
might be necessary too, returning the list of topic partitions assigned to the consumer (through either subscription or manual assignment).
Issue Analytics
- State:
- Created 4 years ago
- Reactions:9
- Comments:24 (3 by maintainers)
Hi everyone. Has there been any progress here? As it stands it is only possible to publish with a partition key, but on the consumption side you have to specify a partition number.
I think what was confusing for me about the API today was that memberAssignment is a Buffer, maybe it should be a more self documenting type alias like
EncodedMemberAssignment
, thoughts?Also it’s confusing as to what
userData
is…For those wondering, here’s what my single-partition assigner looks like: