Treat blank strings differently to nulls in surrogate_key()
See original GitHub issueDescribe the bug
If I had a table like so:
| field1 (STRING) | field2 |
|---|---|
| NULL | 1 |
| ‘’ | 1 |
running dbt_utils.surrogate_key(['field1', 'field2']) would produce the same value for both rows
This is because of the coalesce() that happens on each value during concatenation see here
DISTINCT and GROUP BY operations treat null different from ‘’, so should we do that as well when making a unique key? Moreover, what would be the proper way to implement this? You likely cannot use another string as a universal substitute and adding a randomly-generated int or string will break idempotency for surrogate keys.
System information
- package: dbt-labs/dbt_utils
version: 0.7.6
Which database are you using dbt with?
- postgres
- redshift
- bigquery
- snowflake
- other (specify: ____________)
The output of dbt --version:
0.20.2
Issue Analytics
- State:
- Created 2 years ago
- Reactions:1
- Comments:15 (12 by maintainers)
Top Results From Across the Web
Generating Surrogate Keys Across Warehouses
Compare the results of these surrogate keys: A table with two columns, comparing surrogate keys when nulls and blank strings are treated.
Read more >Treat empty strings as NULL results in comparison errors
Hi, I recently enabled the option "Treat empty strings as NULL". The results showed what seemed to be false differences.
Read more >Blank value allowed for primary and foreign key? - TechNet
Columns with empty strings can be used in primary keys and as foreign ... the NULL and '' will be treated as equivalent...
Read more >Unique key with NULLs - mysql
Use the EmployeeId surrogate key everywhere that you previously ... The new column value would be ""(empty string) if date_of_birth is null.
Read more >'Re: Missing Values, Primary Keys, and Unique Indexes' - MARC
SAS has decided in this specific case to treat blank values as if they are truly NULL, which therefore means they are inappropriate...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found

Yes @callum-mcdata i think that works well.
Thanks everyone for weighing in, especially as I don’t know that I did a good job of explaining the second option.
Another attempt:
By doing this,
The cost of this approach is that anyone using the Cloud IDE who doesn’t have someone to do a local clone and update will be unable to upgrade to utils 1.0 without doing something like that GitHub action I linked above. If someone makes the action, we can link it in the changelog at some point.
Resolved by #685