Custom schemas: table already exists
See original GitHub issueIssues with re-running workflows when using custom schemas.
When I create a model with a custom schema configured:
-- models/clean/clean_accounts.sql
{{ config(alias='accounts', schema='clean', materialization='table') }}
select * from {{ source('incoming', 'accounts') }}
I am able to run the workflow successfully once:
> dbt run
...
Completed successfully
However, if I run the same workflow again I get an error:
> dbt run
...
Runtime Error in model clean_orders (models/clean/clean_accounts.sql)
Database Error
org.apache.spark.sql.AnalysisException: `dev_clean`.`accounts` already exists.;
Instead, the table should be dropped and recreated. If we repeat the same exercise without the schema='clean' configuration, everything works as expected.
Issue Analytics
- State:
- Created 4 years ago
- Comments:7 (4 by maintainers)
Top Results From Across the Web
Custom schemas: table already exists · Issue #38 · dbt-labs ...
So what I think is happening here is that DBT isn't picking up that the table already exists when it attempts a subsequent...
Read more >Error 'Table '.\schema\table' already exists when adding ...
Seems like a very odd error to get. The types of the columns are both "bigint(20) not null". Both tables are InnoDB. The...
Read more >CREATE VIEW - Snowflake Documentation
A CREATE VIEW statement produces an error if a table with the same name already exists in the schema. When a view is...
Read more >Describing Databases with MetaData
This method will issue queries that first check for the existence of each individual table, and if not found will issue the CREATE ......
Read more >SQL CREATE/ALTER/DROP SCHEMA - w3resource
In MySQL, CREATE SCHEMA is a synonym for CREATE DATABASE. Syntax: CREATE {DATABASE | SCHEMA} [IF NOT EXISTS] db_name [create_specification] ...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found

hey @eamontaaffe - thanks for your thoughtful writeup here! I appreciate your patience - it was hard to get back in the swing of the dbt-spark plugin, but I’m excited to get this (and the other open PRs in this repo) merged!
I think the change you’ve proposed here is uncontroversial - let me pick this up with you in the open PR.
In the spirit of figuring out what was actually going wrong with
adapter.get_relation, I discovered the cause: in Spark, unlike in other dbt adapters,databaseandschemaare one and the same. Only theschemaproperty of the materialization is updated, however, when a custom schema is declared in a model config. When dbt checks the cache here for a table matching both thedatabaseandschemaof the model, it supplies the custom schema forschemabut the default (target.database) fordatabase.I think we should fix
get_relation, rather than the workaround in #42. We could redefine allget_relationcalls to look likeOr we could re-implement
cache.get_relationsfor the Spark adapter to only check for a matchingschema. I’m leaning toward the latter, what do you think @drewbanin?