question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Add manual assignment of TopicPartitions to Consumer

See original GitHub issue

In using Kafka for stream processing it’s common practice to have to manually assign topic-partitions for consumption, rather than subscribing and receiving an assignment through the group.

For example, when processing messages from a partition the consumer subscribes to, you might have to keep some state. To make sure this state is durable, it’s often replicated to a changelog, also implemented by a Kafka topic. That way, when the consumer is designed another partition or crashes, the state can be restored. To make sure the right processing state is restored before processing continues, the changelog is written to a topic with the same partition for which we’re processing, a process called copartitioning. By simply consuming the state-replication-topic from the same partition for which you’re processing messages, you’re guaranteed to restore the right state.

In the above example, subscribing to a ConsumerGroup won’t work: you’ve already been assigned a partition for the input topic and you need exactly that same partition number. There’s other examples, too, like replicating a changelog as a local Cache on every node, ready to be queried through HTTP. In that case, you want to consume all partitions, rather than just be assigned a couple.

The documentation of the official Java client describes it as well:

In the previous examples, we subscribed to the topics we were interested in and let Kafka dynamically assign a fair share of the partitions for those topics based on the active consumers in the group. However, in some cases you may need finer control over the specific partitions that are assigned. For example:

  • If the process is maintaining some kind of local state associated with that partition (like a local on-disk key-value store), then it should only get records for the partition it is maintaining on disk.
  • If the process itself is highly available and will be restarted if it fails (perhaps using a cluster management framework like YARN, Mesos, or AWS facilities, or as part of a stream processing framework). In this case there is no need for Kafka to detect the failure and reassign the partition since the consuming process will be restarted on another machine.

Proposed solution

Like the Java KafkaConsumer does, allow to call consumer.assign instead of consumer.subscribe with an array of topic partitions.

To make this practical, implementing consumer.assignments() might be necessary too, returning the list of topic partitions assigned to the consumer (through either subscription or manual assignment).

Issue Analytics

  • State:open
  • Created 4 years ago
  • Reactions:9
  • Comments:24 (3 by maintainers)

github_iconTop GitHub Comments

2reactions
verthocommented, Apr 28, 2022

Hi everyone. Has there been any progress here? As it stands it is only possible to publish with a partition key, but on the consumption side you have to specify a partition number.

2reactions
dwinrick-levercommented, Feb 17, 2022

I think what was confusing for me about the API today was that memberAssignment is a Buffer, maybe it should be a more self documenting type alias like EncodedMemberAssignment, thoughts?

export type GroupMemberAssignment = { memberId: string; memberAssignment: Buffer }

Also it’s confusing as to what userData is…

For those wondering, here’s what my single-partition assigner looks like:

const SinglePartitionAssigner: PartitionAssigner = () => ({
            name: 'SinglePartitionAssigner',
            version: 1,
            async assign() {
                return [
                    {
                        memberId: 'what',
                        memberAssignment: AssignerProtocol.MemberAssignment.encode({
                            version: this.version,
                            assignment: {
                                [topic]: [partition]
                            },
                            userData: Buffer.from([]) // no idea what this is for
                        })
                    }
                ]
            },
            protocol({topics}) {
                return {
                    name: this.name,
                    metadata: AssignerProtocol.MemberMetadata.encode({
                        version: this.version,
                        topics,
                        userData: Buffer.from([]) // no idea what this is for
                    }),
                }
            }
        });
Read more comments on GitHub >

github_iconTop Results From Across the Web

Kafka consumer with manual assignment to a specific partition ...
I have a Kafka topic with 3 partitions. I am trying to create a test Consumer to fetch last N messages from each...
Read more >
Kafka - Manually Assign Partition To A Consumer - LogicBig
A topic partition can be assigned to a consumer by calling KafkaConsumer#assign() public void assign(java.util.Collection<TopicPartition> ...
Read more >
Understanding Kafka partition assignment strategies and how ...
In this post, we will see which strategies can be configured for Kafka Client Consumer and how to write a custom PartitionAssignor implementing ......
Read more >
Kafka Consumer Assignments - Signal
Topic-partitions added or removed. Scenario 1 may occur if your broker has the topic auto-creation setting enabled or if you manually ...
Read more >
Implementing a Kafka consumer in Java - Mateusz Bukowicz
You just need to add an official Kafka dependency to your ... One way do to this is to manually assign your consumer...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found