KAFKA-3627: KafkaConsumer.poll() fails to execute delayed tasks in poll when records are available
See original GitHub issueMy understanding of the consumer failure detection (based on the official client docs - I couldn’t spot any discussion on this issue in the kafka-python docs?) was that for a consumer to stay in a group it just needed to make sure the time between each poll wasn’t more than session_timeout_ms
, and the standard way to control this was max_poll_records
.
However, it appears that with the kafka-python implementation, a call to KafkaConsumer.poll() doesn’t always cause a KafkaClient.poll(), and it is the latter that is required to stay in the group (i.e. heartbeats). Specifically, if enough data is already available to satisfy the KafkaConsumer.poll() it is returned without calling KafkaClient.poll(): https://github.com/dpkp/kafka-python/blob/master/kafka/consumer/group.py#L600
As far as I can tell, the official client doesn’t have this issue; it always calls ConsumerCoordinator.poll(): https://github.com/apache/kafka/blob/trunk/clients/src/main/java/org/apache/kafka/clients/consumer/KafkaConsumer.java#L1033
This issue can be worked around by using fetch_max_bytes
to control the time between KafkaClient.poll()s, rather than max_poll_records
to control the time between KafkaConsumer.poll()s. This doesn’t seem like an ideal solution, however. A lot of my consumers process very small messages - a LOT of them fit in the default 50MB fetch_max_bytes
, so I have to set it a lot smaller, which seems like it could have adverse effects on performance?
(semi-related to https://github.com/dpkp/kafka-python/issues/948, but as I understand it even with background heartbeats you need to poll at least every max.poll.interval.ms
, so would still be an issue)
Issue Analytics
- State:
- Created 7 years ago
- Comments:6 (5 by maintainers)
Good catch - this was fixed originally in the java client via KAFKA-3627 . Since then they have switched to a background thread implementation. kafka-python has not implemented either approach yet. I’ve renamed this issue to mirror KAFKA-3627. See #948 re heartbeats via background thread.
The delayed tasks code was removed in favor of background thread, so I’m going to close this.