[CT-1358] [Feature] Add column type constraints as dbt native configs
See original GitHub issueIs this your first time submitting a feature request?
- I have read the expectations for open source contributors
- I have searched the existing issues, and I could not find an existing issue for this feature
- I am requesting a straightforward extension of existing dbt functionality, rather than a Big Idea better suited to a discussion
Describe the feature
Problem:
- Data types are created after the fact(assumed by the database) and/or a dbt user has to explicitly cast data types in their SQL. Because of this, there isn’t much incentive to assure dbt models have expected data types for consumers to depend on(it’s tedious work 😕). This may not be a problem for 50 dbt models, but becomes a big problem when you have 2000+ dbt models. There should be an easier way to configure data type constraints for any dbt models and provide options to enable/disable data type enforcement(think: what mypy does for the python static typing experience).
Solution:
- Bring data types as native to the dbt experience
- Example BigQuery Constraints Macro:https://gist.github.com/sungchun12/f7ea081773ae824a83294649530d6e41
- Related discussion: https://github.com/dbt-labs/dbt-core/discussions/5244#discussioncomment-3880254
Describe alternatives you’ve considered
Custom Materialization Macro in BigQuery: https://www.loom.com/share/1f1f190e66254d12984962c613e8082d
This is a general design pattern, but uses meta configs to enable this functionality. This is brittle because it injects custom configs into the manifest that aren’t guaranteed to behave in expected ways.
Who will this benefit?
dbt users with large projects that need extra robustness in their developer experience and downstream users get the data they expect: data types and all.
Are you interested in contributing this feature?
yes, and I’ll be working with someone in the community @jonathanneo
Anything else?
Research:
- Postgres Constraints: https://www.postgresql.org/docs/current/ddl-constraints.html#id-1.5.4.6.6
- Snowflake ONLY enforces
not null
constraints, the rest are nice to have metadata: https://docs.snowflake.com/en/sql-reference/constraints-overview.html#supported-constraint-types - BigQuery Macro Demo: https://www.loom.com/share/1f1f190e66254d12984962c613e8082d
Considerations:
- have the configs embedded in dbt-core and then each adapter can take stable configs to adjust table materialization macros
- Enforce column positions based on schema config
- dbt-core contains the hub of native configs while adapters own specific implementations as each database has nuances to which enforceable constraints are valid.
- think about not null and default values for a table
- Work with Jon Neo from Canva
- Include a
check
constraint as pgsql enables it? - What if constraints could be defined once within a subfolder path in dbt_project.yml and then in automatically creates not null constraints in the DDL?
- Leverage how seed configs enable data types: https://docs.getdbt.com/reference/resource-configs/column_types
Issue Analytics
- State:
- Created a year ago
- Reactions:2
- Comments:6 (5 by maintainers)
Top Results From Across the Web
Column Type Constraints as dbt Native: Demo - Loom
Short-term Solution: Custom database-specific macros to enforce column data types AND constraints AND default values. Theoretically, you may never need to run a ......
Read more >How do I specify column types? - dbt Developer Hub
Specify column types in models. ... insert into dbt_alice.my_table ( ... constraints on your column, use dbt's testing functionality to ...
Read more >Configs, properties, what are they? - dbt Developer Hub
Describe models, snapshots, seed files, and their columns. Assert "truths" about a model, in the form of tests, e.g. "this id column is...
Read more >Add constraints and descriptions to table - dbt Discourse
I need to define constraints at table, but I could not find any way to ... Like below: columns: - name: company_id config:...
Read more >Primary key in SQL (AKA Constraints) - dbt Docs
A primary key is a non-null column in a database object that uniquely identifies each row. Primary keys take the form of a...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
I volunteer to help contribute the Snowflake portions and help test on a real world project. This will really help with incremental updates especially!
Good spike! Just in time too, more and more data warehouse providers are adding constraints into their table definitions. For example, databricks have just added table constraints in October 2022: https://docs.databricks.com/tables/constraints.html