BigQuery insert_overwrite incremental strategy fails for day partitioned tables
See original GitHub issueDescription
I have a config for a model (using BigQuery) like below (in the model file):
{{
config(
partition_by = {
'field': 'ETL_CREATE_DTS',
'data_type': 'timestamp',
'granularity': 'day'
},
unique_key = "TRANSACTIONID",
incremental_strategy = 'insert_overwrite'
)
}}
ETL_CREATE_DTS is of the type TIMESTAMP
When running this model the first time (dbt run
), it succeeds. However the second time, when the query is referencing the target table to find out what partitions to overwrite, it fails with the error:
Database Error in model msa_premium_transaction_fact_clean (models/cleaning/msa/msa_premium_transaction_fact_clean.sql)
Query error: Cannot coerce expression (
select as struct
array_agg(distinct timestamp_trunc(ETL_CREATE_DTS, day))
from `gcp-ent-property-etl-dev`.`sample_dw`.`msa_premium_transaction_fact_clean__dbt_tmp`
) to type STRUCT<ARRAY<DATE>> at [106:46]
compiled SQL at target/run/dbt_etl_poc/models/cleaning/msa/msa_premium_transaction_fact_clean.sql
Turns out the generated SQL for this insert_overwrite is incorrect:
-- generated script to merge partitions into `gcp-ent-property-etl-dev`.`sample_dw`.`msa_premium_transaction_fact_clean`
declare dbt_partitions_for_replacement array<date>;
declare _dbt_max_partition timestamp;
set _dbt_max_partition = (
select max(ETL_CREATE_DTS) from `gcp-ent-property-etl-dev`.`sample_dw`.`msa_premium_transaction_fact_clean`
);
-- 1. create a temp table
....
-- 2. define partitions to update
set (dbt_partitions_for_replacement) = (
select as struct
array_agg(distinct timestamp_trunc(ETL_CREATE_DTS, day))
from `project-id`.`sample_dw`.`msa_premium_transaction_fact_clean__dbt_tmp`
);
-- 3. run the merge statement
...
Notice the variable declare dbt_partitions_for_replacement array<date>;
will not match the output of (2) because timestamp_trunc
returns a TIMESTAMP type.
Steps To Reproduce
In as much detail as possible, please provide steps to reproduce the issue. Sample data that triggers the issue, example model code, etc is all very helpful here.
Expected behavior
I should be able to declare a BigQuery table partitioned by day on a timestamp field, and the insert_overwrite strategy work. A quick fix in the code seems to be the declared variable needs to have the same datatype defined for the array as the data type of the partitioned field in the BigQuery table (see below)
It looks as if this line (https://github.com/fishtown-analytics/dbt/blob/2b48152da66dbd7f07272983bbc261f1b6924f20/plugins/bigquery/dbt/include/bigquery/macros/materializations/incremental.sql#L20) could be removed to allow the the variable definition (https://github.com/fishtown-analytics/dbt/blob/2b48152da66dbd7f07272983bbc261f1b6924f20/plugins/bigquery/dbt/include/bigquery/macros/materializations/incremental.sql#L52) to reflect the exact data type.
System information
Which database are you using dbt with?
- postgres
- redshift
- bigquery
- snowflake
- other (specify: ____________)
The output of dbt --version
:
installed version: 0.19.0
latest version: 0.19.0
Up to date!
Plugins:
- bigquery: 0.19.0
- snowflake: 0.19.0
- redshift: 0.19.0
- postgres: 0.19.0
The operating system you’re using: macOS Catalina
The output of python --version
:
Python 2.7.16
Issue Analytics
- State:
- Created 3 years ago
- Comments:6 (3 by maintainers)
Top GitHub Comments
Thanks for the report @osusam28 @noahbruegmann!
This is a regression in v0.19.0, an unintended side-effect of the addition of
granularity
. As you noted, the fix here should be quite straightforward: BigQuery now supportstimestamp
-type partitions, so there’s no need to cast todate
as before. This may require a one-time full-refresh for folks who have current incremental models partitioned by adate
type (date(timestamp_col)
) who will now be partitioning by atimestamp
type (timestamp_trunc(timestamp_col, day)
).In any case, we’ll get this fixed for v0.19.1. We also need better test coverage for the matrix of potential partition data types and incremental strategies, to avoid regressions like this one (or #3063) from happening in the future.
@jtcohen6 Oh gosh, I just now realised you can choose and change the dbt version in cloud under your environment. I was on 0.17.0. Sorry for nothing!