question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Treat blank strings differently to nulls in surrogate_key()

See original GitHub issue

Describe the bug

If I had a table like so:

field1 (STRING) field2
NULL 1
‘’ 1

running dbt_utils.surrogate_key(['field1', 'field2']) would produce the same value for both rows

This is because of the coalesce() that happens on each value during concatenation see here

DISTINCT and GROUP BY operations treat null different from ‘’, so should we do that as well when making a unique key? Moreover, what would be the proper way to implement this? You likely cannot use another string as a universal substitute and adding a randomly-generated int or string will break idempotency for surrogate keys.

System information

  - package: dbt-labs/dbt_utils
    version: 0.7.6

Which database are you using dbt with?

  • postgres
  • redshift
  • bigquery
  • snowflake
  • other (specify: ____________)

The output of dbt --version:

0.20.2

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Reactions:1
  • Comments:15 (12 by maintainers)

github_iconTop GitHub Comments

2reactions
joellabescommented, Sep 16, 2022

Yes @callum-mcdata i think that works well.

Thanks everyone for weighing in, especially as I don’t know that I did a good job of explaining the second option.

Another attempt:

  • The current surrogate_key macro will be fully deprecated and only throw an error message.
  • The new macro, generate_surrogate_key, will default to treating nulls and blanks differently.
  • Users can opt into the old setting at a project-wide level with a variable.

By doing this,

  • the default path works correctly for all future users without ant additional config, which is the purpose of the v1.0 migration.
  • We also avoid accidentally hosing people’s primary keys that they use in snapshots/incremental models, which a) causes data integrity issues and b) could cause a lot of additional spend on reprocessing data.

The cost of this approach is that anyone using the Cloud IDE who doesn’t have someone to do a local clone and update will be unable to upgrade to utils 1.0 without doing something like that GitHub action I linked above. If someone makes the action, we can link it in the changelog at some point.

1reaction
dbeatty10commented, Oct 17, 2022

Resolved by #685

Read more comments on GitHub >

github_iconTop Results From Across the Web

Generating Surrogate Keys Across Warehouses
Compare the results of these surrogate keys: A table with two columns, comparing surrogate keys when nulls and blank strings are treated.
Read more >
Treat empty strings as NULL results in comparison errors
Hi, I recently enabled the option "Treat empty strings as NULL". The results showed what seemed to be false differences.
Read more >
Blank value allowed for primary and foreign key? - TechNet
Columns with empty strings can be used in primary keys and as foreign ... the NULL and '' will be treated as equivalent...
Read more >
Unique key with NULLs - mysql
Use the EmployeeId surrogate key everywhere that you previously ... The new column value would be ""(empty string) if date_of_birth is null.
Read more >
'Re: Missing Values, Primary Keys, and Unique Indexes' - MARC
SAS has decided in this specific case to treat blank values as if they are truly NULL, which therefore means they are inappropriate...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found