Duplicate Rows detected during snapshot
See original GitHub issueWe have been battling a dbt bug for several months now that we were hopeful was solved in the release of 0.17.0.
Consistently, the snapshot of a table we have breaks due to the following error:
Database Error in snapshot user_campaign_audit (snapshots/user_campaign_audit.sql) 100090 (42P18): Duplicate row detected during DML action
Checking our snapshot table, there are indeed multiple rows with identical dbt_scd_id
s. The table being snapshot changes it’s schema with relatively high frequency. It’s a core table that feeds a lot of downstream tables, so new columns are added fairly often. We also run a production dbt run every time we merge a branch into our master branch (we are running dbt on a GItlab CI/CD flow), so the snapshot can run multiple times a day.
Our current approach to fix this is to create a copy of the snapshot table, reduce it to every distinct record, and then use that as the production version of the table. Something like:
create broken_audit_table as (select distinct * from audit_table);
alter table broken_audit_table swap with audit_table;
'grant ownership on audit_table to role dbt;
Let me know if there is any more detail I can provide. Full stack is Fivetran/Snowflake/dbt
Issue Analytics
- State:
- Created 3 years ago
- Comments:15 (4 by maintainers)
I have the same issue on exasol with error message: “Unable to get a stable set of rows in the source tables” and there are duplicate lines in the temp table before merge, even though the soures are clean. I figured out, that a single quote within a varchar column caused the problem, after excluding all rows with single quotes in the string, the duplicates where gone
I think in my case the issue may have been caused by running two instances of dbt concurrently. We have been migrating Airflow instances and had a dbt dag running on both instances at one point. I suspect that the snapshot command ran at the same time on both by accident and this is the root cause on my case.