OffsetCommitRequest timeout causes consumers rebalancing
See original GitHub issueDescription
Hello,
We have been using the latest Kafka client library (1.2.0) with defaults settings.
Our typical Kafka topic consumption loop is to read an event and commit it one by one.
Recently we have noticed a lot of random Broker: Unknown member
exceptions while commiting event offset.
Logs says:
{"Message":"[thrd:GroupCoordinator]:
GroupCoordinator/3: Timed out HeartbeatRequest in flight (after 10963ms, timeout #0): possibly held back by preceeding OffsetCommitRequest with timeout in 48457ms",
"ClientInstance":"rdkafka#consumer-1","Facility":"REQTMOUT"}
then this:
{"Message":"[thrd:GroupCoordinator]:
GroupCoordinator/3: Timed out 1 in-flight, 0 retry-queued, 0 out-queue, 0 partially-sent requests",
"ClientInstance":"rdkafka#consumer-1","Facility":"REQTMOUT"}
And finally this happens (because of rebalancing)
Broker: Unknown member --->
Confluent.Kafka.KafkaException: Broker: Unknown member\n
at Confluent.Kafka.Impl.SafeKafkaHandle.Commit(IEnumerable`1 offsets)\n at
Confluent.Kafka.Consumer`2.Commit(ConsumeResult`2 result)
I’m wondering why we see this preceeding OffsetCommitRequest
if we just commit offsets one by one sequentially.
Could you please help to figure out what is happening?
How to reproduce
NuGet packages installed:
<PackageReference Include="Confluent.Kafka" Version="1.2.0" />
while (true)
{
consumeResult = _consumer.Consume(500ms);
if (consumeResult == null)
{
return;
}
_consumer.Commit(consumeResult);
}
Issue Analytics
- State:
- Created 4 years ago
- Comments:17 (8 by maintainers)
Top Results From Across the Web
java - Kafka Consumer CommitFailedException
A rebalance takes place if you add a consumer to an existing ConsumerGroup. Therefore, it is essential to close the consumer after usage...
Read more >Kafka Consumer Group Rebalance (1 of 2) | by Rob Golder
Consumer group rebalance can be triggered by a number of factors as the participants of the group change, which leads to the reassignment...
Read more >Troubleshoot continuous rebalancing of your Amazon MSK ...
This means that the consumer doesn't get to the next iteration of the poll loop in time to avoid a session timeout. Note:...
Read more >Kafka Consumer | Confluent Documentation
If the consumer crashes, then after a restart or a rebalance, the position of all partitions owned by the crashed consumer will be...
Read more >Understanding Kafka's Consumer Group Rebalancing
Kafka's rebalance protocol can fail for a number of reasons. Kafka does contain configurable retry logic, and even backoff times between retry ...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
@aouakki , @oleg-orlenko In my company we have been migrating everything to much more stable go-based client https://github.com/Shopify/sarama
@alex-namely - we see a lot of people migrating to the confluent go client from sarama for the same reason. the confluent go client is used heavily by some of the largest users of kafka. can’t name names, but you’re most likely using more than one product powered by it.
@aouakki - we’re looking into this.