question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Create zero_pad_timestamp_ms() macro to work around Snowflake and Redshift producing different results in surrogate_key()

See original GitHub issue

Describe the bug

Due to internal mechanics Redshift and Snowflake could produce different surrogate keys on the same data sets. It is very confusing in case you are doing migration or somehow use both of databases. The main issue are connected with casting of timestamp fields. It implements in really strange way in Redshift. As an example:

Input:

select
    cast(TIMESTAMP '2021-03-19 17:07:10.123321' as varchar),
    cast(TIMESTAMP '2021-03-19 17:07:10.123000' as varchar),
    cast(TIMESTAMP '2021-03-19 17:07:10.100000' as varchar),
    cast(TIMESTAMP '2021-03-19 17:07:10.000000' as varchar);

Output Redshift:

2021-03-19 17:07:10.123321,
2021-03-19 17:07:10.123,
2021-03-19 17:07:10.10,
2021-03-19 17:07:10

Output Snowflake:

2021-03-19 17:07:10.123321,
2021-03-19 17:07:10.123000,
2021-03-19 17:07:10.100000,
2021-03-19 17:07:10.000000

Steps To Reproduce

Just use dbt https://github.com/dbt-labs/dbt-utils#surrogate_key-source on any value with timestamp.

Expected behavior

I suppose it is better in case surrogate key will produce the same output with any database adapter and will be agnostic to database engine.

Screenshots and log output

If applicable, add screenshots or log output to help explain your problem.

System information

Which database are you using dbt with?

  • postgres
  • [ x] redshift
  • bigquery
  • [ x] snowflake
  • other (specify: ____________)

Additional context

I’m not sure if I should put the issue into bug section or in feature one. I also not sure if the issues affect a lot of people, may be it no so nessesary to fix. However, in case you faced the issues it will require enormous amount of efforts to fix.

P.S. I could to fix the issue for Redshift adapter to let it produce the same expected output. Just point me where to start.

Issue Analytics

  • State:open
  • Created 2 years ago
  • Comments:6 (6 by maintainers)

github_iconTop GitHub Comments

1reaction
joellabescommented, Sep 15, 2022

I’ve removed the 1.0 label, because this can be handled at any time without causing backwards-compatibility issues. The new macro zero_pad_timestamp_ms() can be added into any new surrogate key implementations, when people know that cross-database compatibility is important to them.

Old: {{ dbt_utils.surrogate_key(['user_id', 'account_created_at']) }}

New: {{ dbt_utils.surrogate_key(['user_id', zero_pad_timestamp_ms('account_created_at')]) }}

My only questions now is whether this new macro belongs in dbt_utils, or whether we punt it back to the adapters to conform to an expected implementation defined in Core. @jtcohen6 and @dbeatty10, what say you?

0reactions
joellabescommented, Sep 15, 2022

default: do nothing

it should still return a string, so I guess it can just be cast({{ column }} as {{ type_string() }}) but yeah basically nothing!

Edit: it doesn’t need to for surrogate_key, which already stringifies things, but I think that other hypothetical consumers should be able to rely on a known datatype coming back

Read more comments on GitHub >

github_iconTop Results From Across the Web

Generating Surrogate Keys Across Warehouses
What's a surrogate key, and how can you generate them across BigQuery, Databricks, Redshift, Snowflake and other data warehouses?
Read more >
Accelerate Snowflake to Amazon Redshift migration using ...
In this walkthrough, we use the Snowflake sample database TPCDS_SF10TCL as the source of the schema conversion. To set up the database migration ......
Read more >
Migration from AWS Redshift to Snowflake ( why and how)
Background - We as a a data professional come across this scenario "migration from AWS redshift to new data cloud data warehouse ...
Read more >
Redshift to Snowflake Migration - Datafold
In this post, we'll walk through the steps for a typical data migration from Redshift to Snowflake. Then, we'll look at how a...
Read more >
Snowflake to Redshift Migration: 3 Easy Steps - Hevo Data
This article will provide you with a step-by-step guide on how you can set up Snowflake to Redshift Migration seamlessly using Python.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found