[SPIKE] Different ways of handling data retention in Kafka
See original GitHub issueDescribe the solution you’d like In this spike we would like to define best default settings for data retention in kafka module.
Describe alternatives you’ve considered
Check if changing log_retention_bytes == -1
to limited disk space is a better option than unlimited.
Issue Analytics
- State:
- Created 4 years ago
- Comments:6 (5 by maintainers)
Top Results From Across the Web
Managing Kafka Offsets to Avoid Data Loss - LotusFlare
There are different variables that could cause data loss in Kafka including data offsets, consumer auto-commit configuration etc.
Read more >Lessons Learned From Running Kafka at Datadog
Kafka's approach to segment-level retention. Kafka organizes data into partitions to support parallel processing and to distribute workload ...
Read more >20 best practices for Apache Kafka at scale | New Relic
Apache Kafka simplifies working with data streams, but it might get complex at scale. Learn best practices to help simplify that complexity.
Read more >Apache Kafka: Ten Best Practices to Optimize Your Deployment
Compaction is a process by which Kafka ensures retention of at least the last known value for each message key (within the log...
Read more >Documentation - Apache Kafka
Store streams of records in a fault-tolerant durable way. ... What is different about Kafka is that it is a very good storage...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
I’ve read docs about Kafka data retention policies and it seems that usage of default Kafka values in Epiphany is the best option. There are 2 retention policies that can be configured on the broker or topic levels: by time and by size. We use the same defult value for size retention as in Kafka configured, -1, which means that by default Kafka log size is unlimited (only retention time is configured).
We could use size retention policy, but in this case if it’s not changed by user and someone starts to spam with large messages amount, old ones will be lost and this case is not better than disk overflow. In my opinion having kafka/disk monitoring with allowing user to make a decision which config values should be defined is the right way.
I agree with @atsikham. I would be in favor of the default settings as described in the official documentation. This is what we have now in Epiphany. The user now has the option of changing these parameters by modifying the specification and overwriting the original values. Disk monitoring could help solve possible problems here.