question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Heartbeats can not be sent manually until messages have been consumed

See original GitHub issue

This issue is of the same nature as #378, where it’s not possible to perform heartbeats before having consumed any messages, as that possibility is only exposed as part of eachBatch.

It’s common within stream processing tasks to pause the fetching of messages and not process any until some work is performed, like restoring a processing state store. Especially in that case, it’s easy for this preparation work to outlast the sessionTimeout, causing a rebalance and restarting the process all over again. This can lead to no processing ever happening, with the consumers stuck in a rebalance loop.

The only chance the assignment has is to send heartbeats, indicating that it hasn’t died, preventing the group coordinator from issuing a rebalance.

Proposed solution

Introduce a consumer.heartbeat method. Like consumer.seek, it could limit it’s calling to only after consumer.run has established the internal consumerGroup.

The Java KafkaConsumer doesn’t have such a method, but that’s because they use calling consumer.poll in combination with max.poll.interval.ms. In KIP-62, where this was added, it’s actually cited a possibility in the granularity of calling a heartbeat manually is needed.

Add a separate API the user can call to indicate liveness: We considered adding a heartbeat() API which the user could use from their own thread in order to keep the consumer alive. This also solves the problem, but it puts the burden of managing that thread (including shutdown coordination) on the user. Although there is some advantage to having a separate API since it allows users to develop their own notion of liveness, we feel must users would simply spawn a thread and call heartbeat() in a loop. We leave this as a possible extension for the future if users find they need it.

Given that KafkaJS does have such an heartbeat API (actually making it a lot easier to deal with this), calling it on the consumer seems to make sense!

Issue Analytics

  • State:open
  • Created 4 years ago
  • Comments:5

github_iconTop GitHub Comments

1reaction
Nevoncommented, Aug 20, 2020

See this for a workaround, and a word of caution: https://github.com/tulios/kafkajs/pull/473#issuecomment-566947785

0reactions
HunderlineKcommented, Aug 20, 2020

Hi @JaapRood @Nevon , I’m running into the same exact issue described here, where back pressure is causing us to miss the heartbeats ending in a rebalance loop. Is there a work around? Thanks!

Read more comments on GitHub >

github_iconTop Results From Across the Web

What does the heartbeat thread do in Kafka Consumer?
From my experience, the case in which the processing thread dies but heartbeat is still alive is extremely rare (and usually results from ......
Read more >
Chapter 4. Kafka Consumers: Reading Data from Kafka
Heartbeats are sent when the consumer polls (i.e., retrieves records) and when it commits records it has consumed. If the consumer stops sending...
Read more >
Working with Amazon SQS messages - AWS Documentation
If you don't know how long it takes to process a message, create a heartbeat for your consumer process: Specify the initial visibility...
Read more >
KIP-62: Allow consumer to send heartbeats from a background
If a message or set of messages always takes longer than the session timeout, then the consumer will not be able to make...
Read more >
Consuming Messages - KafkaJS
If your workload involves very slow processing times for individual messages then you should either increase the session timeout or make periodic use...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found