Add test groups
See original GitHub issueDescribe the feature
Create a feature to specify “test groups” such that there is a shorthand for specifying several related tests on a model at once. Currently, Fishtown best practices recommend that you specify unique
and not_null
tests on the primary key of each of your models - these tests are logically related. This feature would allow you specify just a primary key test group on a single field, and would automatically generate and compile both a unique
and not_null
test for that model on a dbt test
invocation. This could be extended to other logically grouped tests on single fields.
Describe alternatives you’ve considered
Alternatives include explicitly calling each test, as it works today! This is still absolutely a viable approach, and has the benefit of forcing analysts to explicitly declare their assumptions about their data.
Additional context
Example:
current state:
version: 2
models:
- name: cool_data_model
columns:
- name: id
tests:
- unique
- not_null
proposed:
version: 2
models:
- name: cool_data_model
primary_key: id
Obviously the example is up for debate - might be worth keeping column-level definitions in there, and maybe something that explicitly says the word test
but generally, a single specification could make the definition of these tests more concise.
Who will this benefit?
This is primarily for analytics engineers to clean up the definitions in a schema.yml
file.
Are you interested in contributing this feature?
For sure!
Issue Analytics
- State:
- Created 3 years ago
- Reactions:1
- Comments:5 (3 by maintainers)
Neat idea @dave-connors-3! What you’re proposing sounds along the lines of a built-in YAML anchor, i.e.
Without, of course, having to define the initial
&primary_key
in each file.I heard a related idea the other day, with a slightly different angle, and I’m now convinced that the two might be connected.
Right now, dbt runs separate queries for each of
unique
,not_null
, etc, even if they’re all defined on the same column in the same table. What if there was a way for dbt to consolidate those tests into a single query?Essentially, I’m thinking of a genuinely different custom schema that acts as a “combo” of existing builtin tests:
Then:
Hypothetically, it would be more efficient in terms of database time and resources. The downside is ambiguity: should the test should fail, it could have been because the column was null, or not unique, or null and not unique.
What do you think?
I’m facing the same problem currently. I’d like to add tests on my repo to enforce (for instance) the respect of the DBT style guide, but what I end up doing is adding a lot of tests to EVERY model in DBT (100+ and counting).
Since thoses are the same tests every time, but to be invoked from different model yml files, I didn’t find any solution with yaml anchors. I don’t see a better solution than having the option of defining a test group (@noel I’ll be really explicit on the naming!).