question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Duplicate consumption on multithreaded scenario (concurrency > 1)

See original GitHub issue

When testing the currency parameter on the Consumer Config for a Spring Cloud Stream microservice (with Kafka), I noticed that several messages are processed twice. This happens because the second thread joins a little bit later than the first one, causing a rebalance prior T1 commiting its offsets, so T2 re-reads some messages from its newly assigned partitions.

I have the idempotence parameters set up, but it is not working with the concurrency parameter set to two as it may generate different producers for each thread, so it is not actually performing exactly-once-semantics.

Here you have an example log:

[
  {
    "@timestamp": "2021-04-15T09:19:11.321+02:00",
    "@version": "1",
    "message": "my-consumer-group: partitions assigned: [MY_AWESOME_TOPIC-0, MY_AWESOME_TOPIC-1]",
    "logger_name": "org.springframework.cloud.stream.binder.kafka.KafkaMessageChannelBinder$1",
    "thread_name": "KafkaConsumerDestination{consumerDestinationName='MY_AWESOME_TOPIC', partitions=0, dlqName='null'}.container-0-C-1",
    "level": "INFO",
    "level_value": 20000
  },
  {
    "whatever": "some message processing...."
  },
  {
    "@timestamp": "2021-04-15T09:19:21.226+02:00",
    "@version": "1",
    "message": "my-consumer-group: partitions assigned: [MY_AWESOME_TOPIC-0]",
    "logger_name": "org.springframework.cloud.stream.binder.kafka.KafkaMessageChannelBinder$1",
    "thread_name": "KafkaConsumerDestination{consumerDestinationName='MY_AWESOME_TOPIC', partitions=0, dlqName='null'}.container-1-C-1",
    "level": "INFO",
    "level_value": 20000
  },
  {
    "@timestamp": "2021-04-15T09:19:21.227+02:00",
    "@version": "1",
    "message": "my-consumer-group: partitions assigned: [MY_AWESOME_TOPIC-1]",
    "logger_name": "org.springframework.cloud.stream.binder.kafka.KafkaMessageChannelBinder$1",
    "thread_name": "KafkaConsumerDestination{consumerDestinationName='MY_AWESOME_TOPIC', partitions=0, dlqName='null'}.container-0-C-1",
    "level": "INFO",
    "level_value": 20000
  }
]

Any clue on why does this happens? It thrills me a bit that there is a first assignment and, 10 seconds later, the second thread joins, firing the rebalance, but T1 already started processing. Shouldn’t all the N threads configured by the concurrency parameter start at the same time to avoid this?

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:8 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
garyrussellcommented, Apr 19, 2021

Spring’s behavior depends on the container AckMode. With AckMode.BATCH (the default), any pending offsets, for already processed records, are committed in onPartitionsRevoked; with AckMode.RECORD, commits are done immediately after processing each record, so there is nothing to do in onPartitionsRevoked since there is nothing pending.

This is not something the application needs to worry about.

0reactions
Dionakracommented, Apr 20, 2021

Thanks Gary, I will tune those parameters to avoid duplicates. I am closing the issue. Again, thanks a lot!

Read more comments on GitHub >

github_iconTop Results From Across the Web

Preventing thread from duplicate processing in java
I tried using a ConcurrentHashMap to hold the process times where I add in the entry as soon as Thread is spawn and...
Read more >
Java Concurrency issues and Thread Synchronization
In this blog post, we'll look at some common pitfalls related to concurrent/multithreaded programs, and learn how to avoid them.
Read more >
Common Concurrency Pitfalls in Java - Baeldung
If two threads want to read the collection at the same time, one has to wait until the other finishes. For this reason,...
Read more >
Multithreading and Concurrency - Java Programming Tutorial
The client class invokes the start() method of the Runnable object. The result is two thread running concurrently – the current thread continue...
Read more >
Java concurrency (multi-threading) - Tutorial - Vogella.com
Concurrency is the ability to run several programs or several parts of a program in parallel. If a time consuming task can be...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found