question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[SPIKE] Different ways of handling data retention in Kafka

See original GitHub issue

Describe the solution you’d like In this spike we would like to define best default settings for data retention in kafka module.

Describe alternatives you’ve considered Check if changing log_retention_bytes == -1 to limited disk space is a better option than unlimited.

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:6 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
atsikhamcommented, Feb 3, 2021

I’ve read docs about Kafka data retention policies and it seems that usage of default Kafka values in Epiphany is the best option. There are 2 retention policies that can be configured on the broker or topic levels: by time and by size. We use the same defult value for size retention as in Kafka configured, -1, which means that by default Kafka log size is unlimited (only retention time is configured).

We could use size retention policy, but in this case if it’s not changed by user and someone starts to spam with large messages amount, old ones will be lost and this case is not better than disk overflow. In my opinion having kafka/disk monitoring with allowing user to make a decision which config values should be defined is the right way.

0reactions
przemyslaviccommented, Feb 12, 2021

I agree with @atsikham. I would be in favor of the default settings as described in the official documentation. This is what we have now in Epiphany. The user now has the option of changing these parameters by modifying the specification and overwriting the original values. Disk monitoring could help solve possible problems here.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Managing Kafka Offsets to Avoid Data Loss - LotusFlare
There are different variables that could cause data loss in Kafka including data offsets, consumer auto-commit configuration etc.
Read more >
Lessons Learned From Running Kafka at Datadog
Kafka's approach to segment-level retention. Kafka organizes data into partitions to support parallel processing and to distribute workload ...
Read more >
20 best practices for Apache Kafka at scale | New Relic
Apache Kafka simplifies working with data streams, but it might get complex at scale. Learn best practices to help simplify that complexity.
Read more >
Apache Kafka: Ten Best Practices to Optimize Your Deployment
Compaction is a process by which Kafka ensures retention of at least the last known value for each message key (within the log...
Read more >
Documentation - Apache Kafka
Store streams of records in a fault-tolerant durable way. ... What is different about Kafka is that it is a very good storage...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found