question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

BigQuery insert_overwrite incremental strategy fails for day partitioned tables

See original GitHub issue

Description

I have a config for a model (using BigQuery) like below (in the model file):

{{
  config(
    partition_by = {
        'field': 'ETL_CREATE_DTS',
        'data_type': 'timestamp',
        'granularity': 'day'
    },
    unique_key = "TRANSACTIONID",
    incremental_strategy = 'insert_overwrite'
  )
}}

ETL_CREATE_DTS is of the type TIMESTAMP

When running this model the first time (dbt run), it succeeds. However the second time, when the query is referencing the target table to find out what partitions to overwrite, it fails with the error:

Database Error in model msa_premium_transaction_fact_clean (models/cleaning/msa/msa_premium_transaction_fact_clean.sql)
  Query error: Cannot coerce expression (
            select as struct
                array_agg(distinct timestamp_trunc(ETL_CREATE_DTS, day))
            from `gcp-ent-property-etl-dev`.`sample_dw`.`msa_premium_transaction_fact_clean__dbt_tmp`
        ) to type STRUCT<ARRAY<DATE>> at [106:46]
  compiled SQL at target/run/dbt_etl_poc/models/cleaning/msa/msa_premium_transaction_fact_clean.sql

Turns out the generated SQL for this insert_overwrite is incorrect:

-- generated script to merge partitions into `gcp-ent-property-etl-dev`.`sample_dw`.`msa_premium_transaction_fact_clean`
      declare dbt_partitions_for_replacement array<date>;
      declare _dbt_max_partition timestamp;

      set _dbt_max_partition = (
          select max(ETL_CREATE_DTS) from `gcp-ent-property-etl-dev`.`sample_dw`.`msa_premium_transaction_fact_clean`
      );

      -- 1. create a temp table

....
      
      -- 2. define partitions to update
      set (dbt_partitions_for_replacement) = (
          select as struct
              array_agg(distinct timestamp_trunc(ETL_CREATE_DTS, day))
          from `project-id`.`sample_dw`.`msa_premium_transaction_fact_clean__dbt_tmp`
      );

      
      -- 3. run the merge statement
...

Notice the variable declare dbt_partitions_for_replacement array<date>; will not match the output of (2) because timestamp_trunc returns a TIMESTAMP type.

Steps To Reproduce

In as much detail as possible, please provide steps to reproduce the issue. Sample data that triggers the issue, example model code, etc is all very helpful here.

Expected behavior

I should be able to declare a BigQuery table partitioned by day on a timestamp field, and the insert_overwrite strategy work. A quick fix in the code seems to be the declared variable needs to have the same datatype defined for the array as the data type of the partitioned field in the BigQuery table (see below)

It looks as if this line (https://github.com/fishtown-analytics/dbt/blob/2b48152da66dbd7f07272983bbc261f1b6924f20/plugins/bigquery/dbt/include/bigquery/macros/materializations/incremental.sql#L20) could be removed to allow the the variable definition (https://github.com/fishtown-analytics/dbt/blob/2b48152da66dbd7f07272983bbc261f1b6924f20/plugins/bigquery/dbt/include/bigquery/macros/materializations/incremental.sql#L52) to reflect the exact data type.

System information

Which database are you using dbt with?

  • postgres
  • redshift
  • bigquery
  • snowflake
  • other (specify: ____________)

The output of dbt --version:

installed version: 0.19.0
   latest version: 0.19.0

Up to date!

Plugins:
  - bigquery: 0.19.0
  - snowflake: 0.19.0
  - redshift: 0.19.0
  - postgres: 0.19.0

The operating system you’re using: macOS Catalina

The output of python --version: Python 2.7.16

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:6 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
jtcohen6commented, Feb 12, 2021

Thanks for the report @osusam28 @noahbruegmann!

This is a regression in v0.19.0, an unintended side-effect of the addition of granularity. As you noted, the fix here should be quite straightforward: BigQuery now supports timestamp-type partitions, so there’s no need to cast to date as before. This may require a one-time full-refresh for folks who have current incremental models partitioned by a date type (date(timestamp_col)) who will now be partitioning by a timestamp type (timestamp_trunc(timestamp_col, day)).

In any case, we’ll get this fixed for v0.19.1. We also need better test coverage for the matrix of potential partition data types and incremental strategies, to avoid regressions like this one (or #3063) from happening in the future.

0reactions
funnel-arvidcommented, Feb 18, 2021

@jtcohen6 Oh gosh, I just now realised you can choose and change the dbt version in cloud under your environment. I was on 0.17.0. Sorry for nothing!

Read more comments on GitHub >

github_iconTop Results From Across the Web

BigQuery insert_overwrite incremental strategy fails for day ...
Expected behavior. I should be able to declare a BigQuery table partitioned by day on a timestamp field, and the insert_overwrite strategy work....
Read more >
Updating partitioned table data using DML | BigQuery
When you use a DML statement to add rows to an ingestion-time partitioned table, you can specify the partition to which the rows...
Read more >
BigQuery + dbt: Incremental Changes - Archive - dbt Discourse
Instead, it asks BigQuery to drop and replace entire partitions as discrete, ... The insert_overwrite strategy still runs a merge statement, ...
Read more >
Understanding dbt Incremental Strategies part 1/2 - Medium
The insert overwrite strategy deletes the selected partitions from the current destination table and insert the selected transformed partitions into it. Figure ...
Read more >
Incremental run on model with partition filter fails when 0 rows ...
When that happens I get the following error from BigQuery: ... since zero rows is sometimes expected)? I'm using insert_overwrite strategy, ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found