question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Create a column that captures why a snapshot record was invalidated

See original GitHub issue

Describe the feature

With the introduction of the invalidate_hard_deletes config, there’s now two reasons why a record can be invalidated — either the record got updated, or it was deleted. It may be useful to differentiate between the two.

It’s possible to handle this in a downstream modeling layer, but it feels like it might be a more reasonable solution to instead handle this in the snapshot by adding a column, dbt_invalidate_reason with values of 'hard_delete', 'update' or null.

This information could be useful if someone wants to create a “current view w/ soft deletes” of the data (i.e. create the view of their data that should have existed in the first place hehe)

select
  *
from {{ ref('my_snapshot') }}
where (dbt_valid_to is null or dbt_invalidate_reason = 'hard_delete')

Describe alternatives you’ve considered

Handling in the modeling layer:

select
    lead(dbt_valid_from) over (partition by {{unique_key}} order by dbt_valid_from) as next_record_started,
    dbt_valid_to - next_record_started as next_record_diff,
    case
      -- invalid, no next record
      when next_record_started is null and dbt_valid_to is not null then 'deleted'
      -- lead time until next record
      when next_record_diff > 0 then 'deleted'
      -- no lead time until next record
      when next_record_diff = 0 then 'updated'
      -- this record is still valid
      else null end as dbt_invalidated_reason
from {{ ref('my_snapshot') }}

Additional context

  • If we introduce this, we might want to provide an operation for people to backfill this reason.
  • Is it also useful to include a sort of dbt_new_record_reason with one of ['new', 'update', 'returned']

Who will this benefit?

Advanced users of snapshots. As always when it comes to snapshots, curious to hear inputs from @joellabes and @codigo-ergo-sum

Are you interested in contributing this feature?

Could do if you wanted me to

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Reactions:2
  • Comments:5 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
joellabescommented, Jan 25, 2021

I talked to Claire about this a bit in DMs, and my argument was that if I’d done this feature, I would have had an explicit dbt_is_deleted column. Deleted rows would have that set to true with a null dbt_valid_to. (I didn’t do the feature though so am happy to take what I get 😂)

I’ve thought a bit more about how we use snapshots and realised we always filter out the deleted ones once we finish date spining anyway, so including them just to remove them again doesn’t make a heap of sense.

I still support this because being more explicit is better than less, but I don’t think I have an immediate use case beyond making auditing easier (which ain’t nothing!)

0reactions
bp-vimalathithancommented, Nov 2, 2022

do we have this feature implemented in latest versions?

Read more comments on GitHub >

github_iconTop Results From Across the Web

Create a column that captures why a snapshot record was ...
With the introduction of the invalidate_hard_deletes config, there's now two reasons why a record can be invalidated — either the record got ...
Read more >
Troubleshoot Reporting Snapshots
When a reporting snapshot fails during a scheduled run, the failure is noted in the Result column. To view the details of a...
Read more >
Snapshots
If the configured updated_at column for a row is more recent than the last time the snapshot ran, then dbt will invalidate the...
Read more >
dbt Snapshots: The Definitive Guide
The alternative option is to invalidate hard deletes. This means that dbt can track the records in your source table that have been...
Read more >
12000-12099: Table Snapshot Messages
Cause: One or more of the specified filter columns is already recorded in the snapshot log. Action: Describe the snapshot log table and...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found