User-defined default selection criteria
See original GitHub issueCredit where due: This was @aescay’s idea many months ago!
Describe the feature
We’ve increasingly heard a desire from community members to limit dbt run
in development to only a subset of their project, or to always exclude a set of infrequently run models. Trying to solve for this with enabled
gets tricky quickly, because disabled resources cannot participate in the DAG and will raise dependency errors accordingly.
Instead, what if there were a way to redefine the default selection criteria? I think the right construct for this is a yaml selector (as in selectors.yml
). We could support a new selector property, default: true|false
, or check for a selector named default
. The former is almost certaintly better, we’d just need to raise an error if multiple selectors have default: true
.
If specified, that selector should override the default includes + excludes defined here:
Then, dbt run
would have the same effect as dbt run --selector my_default_selector
.
Questions
- How to combine the user-defined default with additional selection criteria passed via CLI syntax? Should the two be combined, or should the CLI criteria entirely override the default? I’m leaning toward total override for two reasons:
- It’s what happens today:
dbt ls
really meansdbt ls --select fqn:* source:* exposure:*
, as soon as the user says-s something_else
, the command becomesdbt ls -s something_else
- Override feels like the only way to make sense of
dbt ls --selector not_the_default
- It’s what happens today:
- Can we support different default selectors in different environments? I think it makes a lot of sense to perma-exclude certain models in development, but still select them in production. As it turns out,
selectors.yml
already supports Jinja today, so this could be as simple as:
selectors:
- name: prod
description: Select everything in prod
default: "{{ target.name == 'prod' | as_bool }}"
definition: 'fqn:* source:* exposure:*'
- name: dev
description: Avoid unpleasant surprises in dev
default: "{{ target.name == 'dev' | as_bool }}"
definition:
union:
- 'fqn:* source:* exposure:*'
- exclude:
- unpleasant
- surprises
Issue Analytics
- State:
- Created 2 years ago
- Reactions:3
- Comments:14 (14 by maintainers)
@TeddyCr Nice progress so far! Sorry I was out last week, so just responding now. I have a few quick comments:
where to get selector info?
I think you want to use
self.selectors
rather thanself.manifest_selectors
(which has already been serialized + expanded, to print a user-friendly version inmanifest.json
). In order to checkself.selectors
for the default selector, you’ll need to includedefault: True|False
inSelectorConfig
, and also the output of theparse_from_selectors_definition
method, since they currently just include the name and definition of each selector.I think there are a few different ways to go about this. The one I can think of is to change
SelectorConfig
from being justDict[str, SelectionSpec]
to instead beingDict[str, Dict[str, str], Dict[str, SelectionSpec]]
.For the example selectors in the original comment, instead of:
They would be represented as:
Then
get_selector
would returnself.selectors[name].definition
, instead of justself.selectors[name]
.Alternatively, you could grab the name of the default selector as a pointer earlier on, and store that somewhere. I think you’ll need to alter the
selectors
config object one way or another.order of operations
Once you can reliably access the default selector definition from the project config, I think you want to slightly adjust the order of the logic you’ve got above, so that you do not use the default selector if the user has passed
--models
or--exclude
. How about something like:testing
The relevant existing unit tests are in
test_graph_selector_parsing
. If you make changes to the structure ofSelectorConfig
/selectors
as I mentioned above, it’s likely you’ll need to adjust this and a few other tests that mock what selectors look like.I also think we’ll want to add an integration test, to make sure this works end-to-end!
Let me know if you find the comments above helpful, and if you’re able to give it another go 😃
@TeddyCr I had been thinking that the change here should be in
graph/cli.py
, and that the approach should be to checkselectors
and, if a default selector is found, override the default includes + excludes, which are used as inputs toparse_difference
lower down.The way this works in practice, is that a task calls
get_selection_spec
. First it checks to see if a--selector
was passed, otherwise, it passes--models
/--select
and--exclude
intoparse_difference
:https://github.com/dbt-labs/dbt/blob/45fe76eef4b4b82ae1442f00310e0c6a121774f2/core/dbt/task/list.py#L171-L176
https://github.com/dbt-labs/dbt/blob/45fe76eef4b4b82ae1442f00310e0c6a121774f2/core/dbt/task/compile.py#L40-L45
https://github.com/dbt-labs/dbt/blob/45fe76eef4b4b82ae1442f00310e0c6a121774f2/core/dbt/task/freshness.py#L139-L150
So now I’m thinking a better approach here may be to add a check within
get_selection_spec
. If none of--selector_name
,--models
/--select
, and--exclude
is set, thenself.config.get_selector('default')
. Here’s some code fortask/list.py
:This works for a selector named
default
! But let’s actually do this by defining a new selector property,default: true|false
, and adding that as an argument toget_selector
. We’ll also want to handle the (very common) case in which a default selector is not defined, by having some way to check for it first.