question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

OffsetCommitRequest timeout causes consumers rebalancing

See original GitHub issue

Description

Hello, We have been using the latest Kafka client library (1.2.0) with defaults settings. Our typical Kafka topic consumption loop is to read an event and commit it one by one. Recently we have noticed a lot of random Broker: Unknown member exceptions while commiting event offset.

Logs says:

{"Message":"[thrd:GroupCoordinator]: 
GroupCoordinator/3: Timed out HeartbeatRequest in flight (after 10963ms, timeout #0): possibly held back by preceeding OffsetCommitRequest with timeout in 48457ms",
"ClientInstance":"rdkafka#consumer-1","Facility":"REQTMOUT"} 

then this:

{"Message":"[thrd:GroupCoordinator]: 
GroupCoordinator/3: Timed out 1 in-flight, 0 retry-queued, 0 out-queue, 0 partially-sent requests",
"ClientInstance":"rdkafka#consumer-1","Facility":"REQTMOUT"} 

And finally this happens (because of rebalancing)

Broker: Unknown member ---> 
Confluent.Kafka.KafkaException: Broker: Unknown member\n 
at Confluent.Kafka.Impl.SafeKafkaHandle.Commit(IEnumerable`1 offsets)\n at 
Confluent.Kafka.Consumer`2.Commit(ConsumeResult`2 result)

I’m wondering why we see this preceeding OffsetCommitRequest if we just commit offsets one by one sequentially.

Could you please help to figure out what is happening?

How to reproduce

NuGet packages installed: <PackageReference Include="Confluent.Kafka" Version="1.2.0" />

while (true)
{
                consumeResult = _consumer.Consume(500ms);
                if (consumeResult == null)
                {
                    return;
                }
                _consumer.Commit(consumeResult);
}

Issue Analytics

  • State:open
  • Created 4 years ago
  • Comments:17 (8 by maintainers)

github_iconTop GitHub Comments

3reactions
alex-namelycommented, Mar 6, 2020

@aouakki , @oleg-orlenko In my company we have been migrating everything to much more stable go-based client https://github.com/Shopify/sarama

1reaction
mhowlettcommented, Mar 6, 2020

@alex-namely - we see a lot of people migrating to the confluent go client from sarama for the same reason. the confluent go client is used heavily by some of the largest users of kafka. can’t name names, but you’re most likely using more than one product powered by it.

@aouakki - we’re looking into this.

Read more comments on GitHub >

github_iconTop Results From Across the Web

java - Kafka Consumer CommitFailedException
A rebalance takes place if you add a consumer to an existing ConsumerGroup. Therefore, it is essential to close the consumer after usage...
Read more >
Kafka Consumer Group Rebalance (1 of 2) | by Rob Golder
Consumer group rebalance can be triggered by a number of factors as the participants of the group change, which leads to the reassignment...
Read more >
Troubleshoot continuous rebalancing of your Amazon MSK ...
This means that the consumer doesn't get to the next iteration of the poll loop in time to avoid a session timeout. Note:...
Read more >
Kafka Consumer | Confluent Documentation
If the consumer crashes, then after a restart or a rebalance, the position of all partitions owned by the crashed consumer will be...
Read more >
Understanding Kafka's Consumer Group Rebalancing
Kafka's rebalance protocol can fail for a number of reasons. Kafka does contain configurable retry logic, and even backoff times between retry ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found