question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

PeriodLoadRule cannot Remove expired segment

See original GitHub issue

Recently, when deploying the cold/hot layered Druid cluster, it was found that a hot node loaded data beyond the time range, resulting in the hot node’s storage being full soon. I found the same problem on the Druid forum page, which has not been handled by anyone for a long time. I checked RunRules.java, I feel there is a problem. Periodloadrule will not delete expired data at all, but only delete too many replicants. Does the current implementation of PeriodLoadRule meet expectations?

The following is the current implementation of druid:

//RunRules.run
      for (Rule rule : rules) {
        if (rule.appliesTo(segment, now)) {
          if (
              stats.getGlobalStat(
                  "totalNonPrimaryReplicantsLoaded") >= paramsWithReplicationManager.getCoordinatorDynamicConfig()
                                                                                   .getMaxNonPrimaryReplicantsToLoad()
              && !paramsWithReplicationManager.getReplicationManager().isLoadPrimaryReplicantsOnly()
          ) {
            log.info(
                "Maximum number of non-primary replicants [%d] have been loaded for the current RunRules execution. Only loading primary replicants from here on for this coordinator run cycle.",
                paramsWithReplicationManager.getCoordinatorDynamicConfig().getMaxNonPrimaryReplicantsToLoad()
            );
            paramsWithReplicationManager.getReplicationManager().setLoadPrimaryReplicantsOnly(true);
          }
          stats.accumulate(rule.run(coordinator, paramsWithReplicationManager, segment));
          foundMatchingRule = true;
          break;
        }
      }

Now, I have solved this problem by adding dropallExpireSegments to PeriodLoadRule.java, but I don’t know what bad effect it will have.

Here is my implementation:

//RunRules.run
      for (Rule rule : rules) {
        if (rule.appliesTo(segment, now)) {
          if (
              stats.getGlobalStat(
                  "totalNonPrimaryReplicantsLoaded") >= paramsWithReplicationManager.getCoordinatorDynamicConfig()
                                                                                   .getMaxNonPrimaryReplicantsToLoad()
              && !paramsWithReplicationManager.getReplicationManager().isLoadPrimaryReplicantsOnly()
          ) {
            log.info(
                "Maximum number of non-primary replicants [%d] have been loaded for the current RunRules execution. Only loading primary replicants from here on for this coordinator run cycle.",
                paramsWithReplicationManager.getCoordinatorDynamicConfig().getMaxNonPrimaryReplicantsToLoad()
            );
            paramsWithReplicationManager.getReplicationManager().setLoadPrimaryReplicantsOnly(true);
          }
          stats.accumulate(rule.run(coordinator, paramsWithReplicationManager, segment));
          foundMatchingRule = true;
          break;
        }else{
          //Add Delete Logic,Only implement dropAllExpireSegments in PeriodLoadRule
          rule.dropAllExpireSegments(paramsWithReplicationManager,segment);
        }
      }

Affected Version

0.22.0

Issue Analytics

  • State:open
  • Created a year ago
  • Comments:14 (14 by maintainers)

github_iconTop GitHub Comments

2reactions
kfarazcommented, Sep 16, 2022

@599166320 , drop is handled by DropRules like a ForeverDropRule, none of the LoadRules are supposed to have that capability. You typically specify a bunch of rules for each datasource. The coordinator tries to find the first rule which applies to a given segment at a given time and tries to do what that matched rule suggests. If at any point in the lifetime of a segment, it matches with a DropRule, it gets dropped.

So, I think the problem you are facing can be solved by simply having a ForeverDropRule at the end of your retention rule list (default or datasource-specific). Please let us know if this works for you.

1reaction
599166320commented, Sep 19, 2022

@kfaraz The cold tier of our cluster has enough historicals. The current problem is that we don’t want to _default_tier’s storage is full too fast.

I will create a PR and let you review it.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Retaining or automatically dropping data - Apache Druid
Druid can fully drop data from the cluster, wipe the metadata store entry, and remove the data from deep storage for any segments...
Read more >
Period load rules should include the future by default #5869
This creates a segment with an interval that is in the future. ... So people who want to retain the last 30 days...
Read more >
Data Retention and Deletion in Apache Druid - Medium
That means Druid can delete entire segments that are older than, say, 1 year — but can't delete only specific rows within these...
Read more >
Using rules to drop and retain data · 2022.11
You can use a period load rule to assign segment data in a specific period to a ... See Data deletion for more...
Read more >
kafka topic with compact,delete does not remove old data
How large are the segment files on disk? What is the segment size set as? · segment. · That's the config, not the...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found