How to achieve 'exactly once' semantics by using an SQL database
See original GitHub issueIn the docs under manual commits (https://kafka.js.org/docs/1.12.0/consuming#manual-commits) it says:
Note that you don't have to store consumed offsets in Kafka, but instead store it in a storage mechanism of your own choosing. That's an especially useful approach when the results of consuming a message are written to a datastore that allows atomically writing the consumed offset with it, like for example a SQL database. When possible it can make the consumption fully atomic and give "exactly once" semantics that are stronger than the default "at-least once" semantics you get with Kafka's offset commit functionality.
Could you please point me to a working example of how this would be achieved?
Many thanks
-Paul
Issue Analytics
- State:
- Created 2 years ago
- Comments:5
Top Results From Across the Web
How We Use Exactly-Once Semantics with Apache Kafka
With exactly -once semantics, you avoid losing data in transit, but you also avoid receiving the same data multiple times. This avoids problems ......
Read more >Using Exactly Once Semantics - SQLstream Documentation
To implement exactly once using a file-based source, we need to decide which method to use to generate and track watermarks: We need...
Read more >How to Achieve Exactly-Once Semantics in Spark Streaming
Exactly -once semantics is one of the advanced topics of stream processing. ... In this article, I'll demonstrate how to use Spark Streaming,...
Read more >java - How to ensure exactly-once semantics when consuming ...
I thought my current setup ensured exactly-once semantics: If an exception is thrown, the consumer is not comitting and the database transaction ...
Read more >Exactly-Once Semantics Are Possible: Here's How Kafka Does It
It offers end-to-end exactly-once guarantees for a stream processing application that extends from the data read from Kafka, any state ...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Don’t have a public working example, but something like this should work:
Gonna try to create an example repo for this, hope it helps
Exactly-once is a very misunderstood topic. When Confluent says that Kafka supports exactly-once semantics, they mean that in the sense that the observable outcome of a topic being processed is the same whether a message has been consumed once or several times, where the observable outcome is a different topic. This is strictly within the context of a stream consuming from one topic and producing to another.
It does not mean that the message is only ever seen once. It just means that if someone is consuming the output topic, the result will be the same whether the stream processor processed the input message once or several times. This can be achieved using a transactional producer. https://dzone.com/articles/interpreting-kafkas-exactly-once-semantics
What @toledompm describes is not exactly-once, for several reasons. First, what if the group is rebalancing and the partition is reassigned to another consumer. Now you potentially have two different consumers processing the same message at the same time. If the first consumer hasn’t finished processing yet,
isProcessed
will be false for both and they will both continue to process the message. Using something like advisory locks could prevent this case by having whoever comes first take the lock, then check if the message has been processed, process the message and finally write to the DB before releasing the lock. However, what if writing to the DB fails? You can’t “unprocess” the message, so it’s gonna be at-least once regardless.Going back to the idea of having the “observable outcome”-interpretation of exactly once, you can indeed achieve this also with a transactional database as long as you store the offsets together with the “result” of the operation. For example:
The docs should probably be amended, since they might give the wrong idea about when this is useful. The Kafka docs describe it as well.