Heartbeats can not be sent manually until messages have been consumed
See original GitHub issueThis issue is of the same nature as #378, where it’s not possible to perform heartbeats before having consumed any messages, as that possibility is only exposed as part of eachBatch
.
It’s common within stream processing tasks to pause the fetching of messages and not process any until some work is performed, like restoring a processing state store. Especially in that case, it’s easy for this preparation work to outlast the sessionTimeout
, causing a rebalance and restarting the process all over again. This can lead to no processing ever happening, with the consumers stuck in a rebalance loop.
The only chance the assignment has is to send heartbeats, indicating that it hasn’t died, preventing the group coordinator from issuing a rebalance.
Proposed solution
Introduce a consumer.heartbeat
method. Like consumer.seek, it could limit it’s calling to only after consumer.run has established the internal consumerGroup.
The Java KafkaConsumer
doesn’t have such a method, but that’s because they use calling consumer.poll
in combination with max.poll.interval.ms
. In KIP-62, where this was added, it’s actually cited a possibility in the granularity of calling a heartbeat manually is needed.
Add a separate API the user can call to indicate liveness: We considered adding a heartbeat() API which the user could use from their own thread in order to keep the consumer alive. This also solves the problem, but it puts the burden of managing that thread (including shutdown coordination) on the user. Although there is some advantage to having a separate API since it allows users to develop their own notion of liveness, we feel must users would simply spawn a thread and call heartbeat() in a loop. We leave this as a possible extension for the future if users find they need it.
Given that KafkaJS does have such an heartbeat API (actually making it a lot easier to deal with this), calling it on the consumer seems to make sense!
Issue Analytics
- State:
- Created 4 years ago
- Comments:5
See this for a workaround, and a word of caution: https://github.com/tulios/kafkajs/pull/473#issuecomment-566947785
Hi @JaapRood @Nevon , I’m running into the same exact issue described here, where back pressure is causing us to miss the heartbeats ending in a rebalance loop. Is there a work around? Thanks!