Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Help: Commit can't be completed since group has already rebalanced...

See original GitHub issue

Made the switch from pykafka to kafka-python over the weekend, which resolved an issue where my Producer would hang sending data to a Kafka cluster I don’t control.

This has had the unforeseen consequence of not allowing me to commit my offsets, seemingly only for messages that take a while to process (but I’ve seen some other messages processed that may be duplicates), though that could be an incorrect assumption. I’ve never noticed problems updating my offset with the other library, and thus I don’t think it has anything to do with Kafka broker settings, likely just something with my consumer.

For reference, I’m using Kafka-Python 1.2.2 and Kafka 0.9

Commit cannot be completed since the group has already rebalanced and assigned the partitions to another member. This means that the time between subsequent calls to poll() was longer than the configured session.timeout.ms, which typically implies that the poll loop is spending too much time message processing. You can address this either by increasing the session timeout or by reducing the maximum size of batches returned in poll() with max.poll.records.

I’m simply creating a single consumer with a group, grabbing the first message available, processing it, and moving on.

consumer = KafkaConsumer(topic, bootstrap_servers=server_list, group_id=group, enable_auto_commit=False)
for message in consumer:
     process_message(message.value)
     consumer.commit()

Issue Analytics

State:
Created 7 years ago
Comments:12 (5 by maintainers)

Top GitHub Comments

4reactions

dpkpcommented, Jul 6, 2016

How long does your process_message take in the worst case? Are you using the default heartbeat and session timeout parameters (it appears so from the consumer side, but you might verify that you haven’t modified default server-side configs).

pykafka maintains a custom leader election / partition assignment system, and I don’t know the details well enough to comment on it. kafka-python attempts to implement exactly the same group coordination system, algorithms, and configuration parameters as the official java client. But you have found one of the issues with the official implementation, namely that “long” message processing can cause unwanted group rebalance operations and interfere with offset commits etc. Rather than implement and maintain our own system here, I prefer to follow the official implementation. They are currently discussing / implementing a background heartbeat mechanism that should help address. Until then, the recommendation is to tune your heartbeat and session timeouts relative to your worst case message processing time.

3reactions

dpkpcommented, Jul 6, 2016

group.max.session.timeout.ms is the broker configuration. It defaults to 30000 (30 seconds). If you increase that you should be able to pass a larger value for session_timeout_ms to your KafkaConsumer instances.