question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Retention policy is applied 1 hour earlier

See original GitHub issue

Describe the bug

As @Tho-Mat pointed out in https://github.com/corona-warn-app/cwa-server/issues/699#issuecomment-671358818, retention policy applied 1 hour earlier than it should:

I also noticed that the index for (examples) the hour-file

> 2020-07-26-hour-06.zip created 26.07.2020 09:05 (German Time)
> was removed at                 09.08.2020 08:05 (German Time)
> 
> 2020-07-26-hour-08.zip created 26.07.2020 11:05 (German Time)
> was removed at                 09.08.2020 10:05 (German Time)

shouldn’t they be removed one hour later to have them 14 Days on the server?

Expected behavior

Retention policy should apply after 14 days.

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:14 (7 by maintainers)

github_iconTop GitHub Comments

2reactions
fredrbcommented, Sep 21, 2020

Update from our side: release 1.4 is currently planned for 30.09.2020.

2reactions
fredrbcommented, Aug 14, 2020

I figured out what is going on. Your analysis is correct, the index file deletes the earliest hour one hour too early.

The problem that we’re facing happens because of the retention policy of the diagnosis keys.

To give an overview of what and why happens:

  1. As previously stated in this thread, the retention policy for the object store either deletes the whole day or nothing. This means that regardless of the current hour, the files can still be fetched on the CDN. However, in the index file, we can see that entries are removed hourly. And they are removed one hour too soon.
  2. The index file is created by the Assembly process, based on the keys mapped to be exported into the hourly files.
  3. The mapping of keys is a simple DB read aggregating keys by their submission timestamp.

During the retention policy step, we delete all the keys from the database that match the query submission_timestamp<= threshold where threshold is now - retention_policy. This means that if we run the retention policy now (14.08 at 14:00), it will delete any entries before or equal (31.07 14:00). Meaning that keys generated for 31.07 14:00 will be deleted, which is not correct, since we are now distributing keys generated between 13:00-14:00 today.

To illustrate this better, we can assume a scenario with single day retention policy running distribution day 2 at 8 am.

Wrong behavior

Day 1 Day 2
00:00
01:00
02:00
03:00
04:00
05:00
06:00
07:00 --> Created
08:00 --> Deleted
09:00
10:00
11:00
12:00
13:00
14:00
15:00
16:00
17:00
18:00
19:00
20:00
21:00
22:00
23:00

This is wrong because we are only holding 23 hours worth of data. Hour 7 is created at 8:00 but removed at 7:00 the next day.

Correct behavior

Day 1 Day 2
00:00
01:00
02:00
03:00
04:00
05:00
06:00
07:00 --> Delete 07:00 --> Created
08:00
09:00
10:00
11:00
12:00
13:00
14:00
15:00
16:00
17:00
18:00
19:00
20:00
21:00
22:00
23:00

We should delete everything before 8am (not inclusive), to assure that we will always have 24h of keys in the system.

There’s already a fix for that, will open the PR in a couple of minutes. @Tho-Mat would also appreciate your comments there. Thanks again for your help investigating and explaining this issue.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Exchange Server: Retention tags and retention policies
Users can apply a personal tag to a message so that it's moved or deleted sooner or later than the settings specified in...
Read more >
Data Retention Policy: What Is It and How to Build One
Data retention policy examples​​ Length of time in a data retention policy ranges from minutes to years. Use a policy engine that involves...
Read more >
Office 365 Retention Policy: How to Apply & Avoid Pitfalls
1. Go to Compliance Center and select Policies in the left-hand panel. You'll be followed to the necessary page.
Read more >
Office 365 - Common Confusion with Email Retention Policies
Retention Policies are processed by a scheduled task that runs every 7 days. This means emails could be kept up to 7 days...
Read more >
Default Retention Policy is getting applied to the... - ServiceNow
2. Retention policy 'X' with a Retention period based on End date and HR Criteria based on location. Let's say the location in...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found