question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Consumer starts with incorrect (much lower) offset after rebalance

See original GitHub issue

Description

About a month ago we updated Confluent.Kafka library from 1.5.3 to the latest 1.7.0 and we started to experience offset issues after rebalancing. Those issues are happening randomly on different consumers, topics and partitions.

During application deployment (application start), after rebalance takes part, it happens that the affected partition is assigned to the consumer, however, the consumer starts consuming on offset thousands lower than the last offset committed - it causes many duplicates to be processed. For example, when we check __consumer_offsets messages after the incident, we can see following messages for one partition [xyz.eshop,xyz.eshop.async-requests,17]:

Last committed offset Wednesday, August 4, 2021 6:10:01.506 PM - 134384

First committed offset after reassignment Wednesday, August 4, 2021 6:10:27.430 PM - 133632

How to reproduce

We did not find a way yet to reproduce it locally, the problem is new and occurs only occasionally during/after rebalance, probably during reassignment. We are not even sure it is connected with the library update itself. But we would appreciate any hints possible.

Checklist

Please provide the following information:

  • A complete (i.e. we can run it), minimal program demonstrating the problem. No need to supply a project file.
  • Confluent.Kafka nuget version - 1.7.0
  • Apache Kafka version - 1.0.2-cp2
  • Client configuration

Number of brokers: 3 Number of partitions: 21 Number of consumers in a consumer group: 4 Consumer configuration:

 var consumerConfig = new ConsumerConfig
            {
                BootstrapServers = servers,
                ClientId = Environment.MachineName,
                GroupId = groupId,
                EnableAutoCommit = true,
                EnableAutoOffsetStore = false,
                AutoOffsetReset = AutoOffsetReset.Latest
            };
  • we have no assignment nor revocation handlers set

  • assignment should happen during poll as a side effect

  • Operating system - debian 9.13, 4.9.0-14-amd64

  • Broker log (affected is consumer group xyz.eshop) controller.log kafka-authorizer.log server.log

  • __consumer_offsets log (affected is [xyz.eshop,xyz.eshop.async-requests,17]) outputoffsets_23.txt

  • Critical issue

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:42 (16 by maintainers)

github_iconTop GitHub Comments

6reactions
mhowlettcommented, Mar 22, 2022
6reactions
barndawecommented, Dec 23, 2021

Hi, adding to this thread as we have exactly the same problem using both Kafka nuget v1.6.3 and v1.8.2. We were initially using a consumer with auto-committing every 5000ms (in fact the same config settings as @georgievaja) and experiencing the same problem every time we added or removed another consumer for the given topics.

In an effort to try and diagnose this issue we tried a lot of different combinations of consumer config options, and eventually also wrote a consumer that manually commits after every message is processed, and this still has the same issue of changing the offset after a rebalance.

Our setup that exhibits the issue is: Consumers: 1 (initially) Brokers: 2 Kafka nuget versions: 1.6.3 and 1.8.2 Kafka version: 2.7.0

This is a simplified version of our consumer:

void Consume(KafkaOptions options)
{
    var config = new ConsumerConfig()
    {
        BootstrapServers = options.BootstrapServers,
        GroupId = options.GroupId,
        PartitionAssignmentStrategy = PartitionAssignmentStrategy.CooperativeSticky,
        EnableAutoCommit = false,
        EnableAutoOffsetStore = false,
        AutoOffsetReset = AutoOffsetReset.Latest,
        AllowAutoCreateTopics = false,
    };

    var consumer = new ConsumerBuilder<string, string>(config).Build();
    
    consumer.Subscribe(topicList);

    var token = new CancellationToken();

    while (!token.IsCancellationRequested)
    {
        var cr = consumer.Consume(token);

        //In this case ProcessMessage is simply creating a different kind of message and publishing it again
        ProcessMessage(cr);

        consumer.StoreOffset(cr);
        consumer.Commit(cr);
        Logger.LogInformation(
            "Stored and committed Topic: {Topic}, Partition: {Partition}, Offset: {Offset}", cr.Topic,
            cr.Partition, cr.Offset);
    }
    
    consumer.Close();
    consumer.Dispose();
}

And attached are logs for the time periods relevant to this issue. MSK-logs.csv

  • Between 9:18:38AM and 9:19:00AM the messages were originally created by an external producer and published, then processed successfully and the offsets committed. All these messages were processed by instance 1.
  • I’ve cut the logs between 9:19:00AM and 9:28:05AM as they contain no useful data.
  • At ~9:28:05AM instance 2 containing the same consumer setup and settings was spun up, causing a rebalance.
  • At ~9:28:17AM instance 1 has started re-processing and committing the already processed and committed messages.
  • This continues for the remainder of the minute and the same offsets are present in the log at least once.
  • At no point does instance 2 appear to re-process any of the already-processed messages, although it does process some that are created as a direct result of instance 1 re-processing the older messages (which produces messages in a topic also handled by these consumers).

Happy to provide any extra details as needed to help diagnose this.

Read more comments on GitHub >

github_iconTop Results From Across the Web

how to use the same offset for a consumer after Rebalance ...
Consumers "C1" and "C2" have no relation to offset values. More specifically, the ordering of assignment is not guaranteed, so referencing ...
Read more >
Solving My Weird Kafka Rebalancing Problems & ...
After the rebalancing, the coordinator notices that more consumer joined and starts another rebalancing. This cycle repeats until all consumers ...
Read more >
Consumer Auto Offsets Reset Behavior | Learn Kafka with ...
For example, if Kafka has a retention of 7 days, and your consumer is down for more than 7 days, the offsets are...
Read more >
Kafka Consumer | Confluent Documentation
Offset commit failures are merely annoying if the following commits succeed since they won't actually result in duplicate reads. However, if the last...
Read more >
Kafka Consumer Groups & Offsets
Offsets are critical for many applications. If a Kafka client crashes, a rebalance occurs and the latest committed offset help the remaining Kafka...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found