Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Add manual assignment of TopicPartitions to Consumer

See original GitHub issue

In using Kafka for stream processing it’s common practice to have to manually assign topic-partitions for consumption, rather than subscribing and receiving an assignment through the group.

For example, when processing messages from a partition the consumer subscribes to, you might have to keep some state. To make sure this state is durable, it’s often replicated to a changelog, also implemented by a Kafka topic. That way, when the consumer is designed another partition or crashes, the state can be restored. To make sure the right processing state is restored before processing continues, the changelog is written to a topic with the same partition for which we’re processing, a process called copartitioning. By simply consuming the state-replication-topic from the same partition for which you’re processing messages, you’re guaranteed to restore the right state.

In the above example, subscribing to a ConsumerGroup won’t work: you’ve already been assigned a partition for the input topic and you need exactly that same partition number. There’s other examples, too, like replicating a changelog as a local Cache on every node, ready to be queried through HTTP. In that case, you want to consume all partitions, rather than just be assigned a couple.

The documentation of the official Java client describes it as well:

In the previous examples, we subscribed to the topics we were interested in and let Kafka dynamically assign a fair share of the partitions for those topics based on the active consumers in the group. However, in some cases you may need finer control over the specific partitions that are assigned. For example:

If the process is maintaining some kind of local state associated with that partition (like a local on-disk key-value store), then it should only get records for the partition it is maintaining on disk.

If the process itself is highly available and will be restarted if it fails (perhaps using a cluster management framework like YARN, Mesos, or AWS facilities, or as part of a stream processing framework). In this case there is no need for Kafka to detect the failure and reassign the partition since the consuming process will be restarted on another machine.

Proposed solution

Like the Java KafkaConsumer does, allow to call consumer.assign instead of consumer.subscribe with an array of topic partitions.

To make this practical, implementing consumer.assignments() might be necessary too, returning the list of topic partitions assigned to the consumer (through either subscription or manual assignment).

Issue Analytics

State:
Created 4 years ago
Reactions:9
Comments:24 (3 by maintainers)

Top GitHub Comments

2reactions

verthocommented, Apr 28, 2022

Hi everyone. Has there been any progress here? As it stands it is only possible to publish with a partition key, but on the consumption side you have to specify a partition number.

2reactions

dwinrick-levercommented, Feb 17, 2022

I think what was confusing for me about the API today was that memberAssignment is a Buffer, maybe it should be a more self documenting type alias like EncodedMemberAssignment, thoughts?

export type GroupMemberAssignment = { memberId: string; memberAssignment: Buffer }

Also it’s confusing as to what userData is…

For those wondering, here’s what my single-partition assigner looks like:

const SinglePartitionAssigner: PartitionAssigner = () => ({
            name: 'SinglePartitionAssigner',
            version: 1,
            async assign() {
                return [
                    {
                        memberId: 'what',
                        memberAssignment: AssignerProtocol.MemberAssignment.encode({
                            version: this.version,
                            assignment: {
                                [topic]: [partition]
                            },
                            userData: Buffer.from([]) // no idea what this is for
                        })
                    }
                ]
            },
            protocol({topics}) {
                return {
                    name: this.name,
                    metadata: AssignerProtocol.MemberMetadata.encode({
                        version: this.version,
                        topics,
                        userData: Buffer.from([]) // no idea what this is for
                    }),
                }
            }
        });

Top Results From Across the Web

Kafka consumer with manual assignment to a specific partition ...

I have a Kafka topic with 3 partitions. I am trying to create a test Consumer to fetch last N messages from each...

Kafka - Manually Assign Partition To A Consumer - LogicBig

A topic partition can be assigned to a consumer by calling KafkaConsumer#assign() public void assign(java.util.Collection<TopicPartition> ...

Understanding Kafka partition assignment strategies and how ...

In this post, we will see which strategies can be configured for Kafka Client Consumer and how to write a custom PartitionAssignor implementing ......

Kafka Consumer Assignments - Signal

Topic-partitions added or removed. Scenario 1 may occur if your broker has the topic auto-creation setting enabled or if you manually ...

Implementing a Kafka consumer in Java - Mateusz Bukowicz

You just need to add an official Kafka dependency to your ... One way do to this is to manually assign your consumer...