Consumer starts with incorrect (much lower) offset after rebalance
See original GitHub issueDescription
About a month ago we updated Confluent.Kafka library from 1.5.3 to the latest 1.7.0 and we started to experience offset issues after rebalancing. Those issues are happening randomly on different consumers, topics and partitions.
During application deployment (application start), after rebalance takes part, it happens that the affected partition is assigned to the consumer, however, the consumer starts consuming on offset thousands lower than the last offset committed - it causes many duplicates to be processed.
For example, when we check __consumer_offsets
messages after the incident, we can see following messages for one partition [xyz.eshop,xyz.eshop.async-requests,17]:
Last committed offset Wednesday, August 4, 2021 6:10:01.506 PM - 134384
First committed offset after reassignment Wednesday, August 4, 2021 6:10:27.430 PM - 133632
How to reproduce
We did not find a way yet to reproduce it locally, the problem is new and occurs only occasionally during/after rebalance, probably during reassignment. We are not even sure it is connected with the library update itself. But we would appreciate any hints possible.
Checklist
Please provide the following information:
- A complete (i.e. we can run it), minimal program demonstrating the problem. No need to supply a project file.
- Confluent.Kafka nuget version - 1.7.0
- Apache Kafka version - 1.0.2-cp2
- Client configuration
Number of brokers: 3 Number of partitions: 21 Number of consumers in a consumer group: 4 Consumer configuration:
var consumerConfig = new ConsumerConfig
{
BootstrapServers = servers,
ClientId = Environment.MachineName,
GroupId = groupId,
EnableAutoCommit = true,
EnableAutoOffsetStore = false,
AutoOffsetReset = AutoOffsetReset.Latest
};
-
we have no assignment nor revocation handlers set
-
assignment should happen during poll as a side effect
-
Operating system - debian 9.13, 4.9.0-14-amd64
-
Broker log (affected is consumer group xyz.eshop) controller.log kafka-authorizer.log server.log
-
__consumer_offsets log (affected is [xyz.eshop,xyz.eshop.async-requests,17]) outputoffsets_23.txt
-
Critical issue
Issue Analytics
- State:
- Created 2 years ago
- Comments:42 (16 by maintainers)
Top GitHub Comments
fix: https://github.com/edenhill/librdkafka/pull/3774
Hi, adding to this thread as we have exactly the same problem using both Kafka nuget v1.6.3 and v1.8.2. We were initially using a consumer with auto-committing every 5000ms (in fact the same config settings as @georgievaja) and experiencing the same problem every time we added or removed another consumer for the given topics.
In an effort to try and diagnose this issue we tried a lot of different combinations of consumer config options, and eventually also wrote a consumer that manually commits after every message is processed, and this still has the same issue of changing the offset after a rebalance.
Our setup that exhibits the issue is: Consumers: 1 (initially) Brokers: 2 Kafka nuget versions: 1.6.3 and 1.8.2 Kafka version: 2.7.0
This is a simplified version of our consumer:
And attached are logs for the time periods relevant to this issue. MSK-logs.csv
Happy to provide any extra details as needed to help diagnose this.