question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[CT-96] Allow unique_key for incremental materializations to take a list

See original GitHub issue

Describe the feature

Right now when creating a model in dbt with materialization set to incremental you can pass in a single column to unique key that will act as the key for merging.

In an ideal world you would be able to pass in multiple columns as there are many cases where a table has more than one column that defines it’s primary key.

The simplest solution would be to change to unique_key to take in a list (in addition to a string for backwards compatibility) and create the predicates for the merge based on the list vs. just the single column.

This might not be ideal as the param name unique_key implies a single key. Alternatives would be adding a new optional parameter unique_key_list or unique_keys that always take a list and eventually deprecate the unique_key parameter.

Describe alternatives you’ve considered

Not necessarily other alternatives but another thing to consider is the use of unique_key throughout the dbt project. It would stand to reason that whatever change is made here would apply to all other usages of unique_key. This can be done in one large roll-out or in stages such as with merges first, then upserts, then snapshots, etc.

Additional context

This feature should work across all databases.

Who will this benefit?

Hopefully most dbt users. Currently the only workaround for this is using dbt_utils.surrogate_key which a) doesn’t work for BigQuery and b) should ideally be an out of the box dbt feature.

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Reactions:11
  • Comments:17 (15 by maintainers)

github_iconTop GitHub Comments

3reactions
JCZuurmondcommented, Jun 12, 2021

Seeing the discussion and the proposed solution, could we rename this issue to “Allow unique_key to take a list”? To make it more generic.

And I am ok with using unique_key for both a str and List[str] even though unique_keys is more accurate for the latter. This happens in pandas sometimes too, e.g. drop and pivot have a columns parameter that can be both str and List[str]. In my opinion this is better than adding a new parameter.

3reactions
jtcohen6commented, May 27, 2020

I see what you mean. When I think of creating a surrogate key for an incremental model, I’m thinking of creating a column within that model, to be stored in the resulting table and passed as the unique_key for subsequent incremental runs:

{{ config(
    materialized = 'incremental',
    unique_key = 'unique_id'
) }}

select
    date_day,
    user_id,
    {{ dbt_utils.surrogate_key('date_day', 'user_id') }} as unique_id

...

You’re right that, as a result of the way that merge macros are implemented on BigQuery, you cannot create the surrogate key directly within the config like so:

{{ config(
    materialized = 'incremental',
    unique_key = dbt_utils.surrogate_key('date_day', 'user_id')
) }}

I’ve now heard this change requested from several folks now, including (if I recall correctly) some Snowflake users who have found that merging on cluster keys improves performance somewhat. So I’m not opposed to passing an array of column names. I’m worried that unique_keys is ambiguous; following the lead of the dbt-utils test, I’m thinking along the lines of unique_combination_of_columns.

@drewbanin What do you think? Is that too much config-arg creep?

Read more comments on GitHub >

github_iconTop Results From Across the Web

[CT-96] Allow unique_key for incremental materializations to ...
Right now when creating a model in dbt with materialization set to incremental you can pass in a single column to unique key...
Read more >
Incremental models - dbt Developer Hub
In cases where you need multiple columns in combination to uniquely identify each row, we recommend you pass these columns as a list...
Read more >
Two (completely different) types of dbt incremental models in ...
Example of type 1 incremental model. Let's say I have data about customer clicks that have the user_id and the clicked_at timestamp.
Read more >
Understanding dbt Incremental Strategies part 1/2 - Medium
Incremental materialization is an advanced and very powerful feature of dbt, however, you don't need to use it in every and each model...
Read more >
incremental load with bigquery isn't working with dbt unique_key
To add data to a dbt incremental model, you need to return only the new rows you would like appended (or updated) to...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found