Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Subselectors for state:modified

See original GitHub issue

This is a follow-on from the initial proposals in #2465, #2641. Most of the required work is around exposing + sugaring the foundational work in #2695.

The way state comparison works is by identifying discrepancies between two manifests. When comparing between a past prod manifest and the current development manifest, discrepancies can be the result of two things:

Changes made to the project in development
Env-aware logic that causes different behavior based on the target, env vars, etc.

We’re going to do our best to capture only changes that are the result of development. If someone’s project has tons of intricate env-aware logic, they’ll run more models than they want (i.e. more false positives). So we’re giving them the option to turn off some knobs, in the form of more-specific subselectors.

Subselectors

There is potential for overlap: a single change can trigger multiple modification categories.

state:modified.contents:

Models: raw file contents changed
Snapshots: raw file contents changed
Data tests: raw file contents changed
Analyses: raw file contents changed
Seeds: raw file contents changed
- However, if the file is >1 MB, we cannot compare raw file contents, so we raise a warning and just compare based on file path instead.

This alone would get a lot of people what they want! It’s basically “just hash the files,” excluding YAML configuration.

state:modified.configs:

Models: changes to materialized, quoting, bind, transient, sort/dist, partition_by, incremental_strategy…
- This category captures changes in dbt_project.yml or {{config()}} blocks. If the changes are made in a {{config()}} block, they will also be picked up as content changes.
- If someone has env-aware logic for materialized, where a model is a view in dev and a table in prod, they will not want to include this.
Snapshots: unique_key, strategy, …
Seeds: quoting, column_types
Schema tests: severity changes

state:modified.descriptions:

If persist_docs is turned on for a node, description changes count as modifications. (If just columns, just column descriptions; just relations, top-level descriptions; if both, then both.)

state:modified.database_representations:

Models: changes to the configured database, schema, identifier. This value represents the manual input only, and it’s different from the resolved database representation, which depends on the target and generate_x_name macros.
- If someone manually sets schema = target.schema, or schema = target.schema ~ '_suffix' instead of using the generate_schema_name macro, that will register as a change between environments and they’ll want to turn this off.
- Depending on the generate_x_name logic and the current environment, a chance to the configured value may not actually change the database representation. We’ll still register it as a modification.
Seeds: treated the same as models.
Sources: database, schema, or identifier has changed. If someone has env-aware definitions, they’ll want to turn this off.
Snapshots: treated the same as sources.

Default behavior

I think state:modified should include all changes from all the categories above. The question mark is whether database_representations should be included in the default, since this is the area where people do the most custom things, and it’s the knob that will likely be switched off most often. For the sake of clarity, I think it’s best to have the state:modified selector be a superset of all modified subselectors.

Future art

state:modified.macros:

A macro’s raw contents have changed
By extension, state:modified.macros+ would include all downstream models, tests, etc. that call (directly or indirectly) a macro that has changed
This also includes implicit macro dependencies such as generate_schema_name

state:modified.vars:

A var value has changed
By extension, state:modified.vars+ would include all downstream models, tests, etc. that call (directly or indirectly) a var that has changed

We will update state:modified to include both of these as well.

Issue Analytics

State:
Created 3 years ago
Reactions:5
Comments:11 (7 by maintainers)

Top GitHub Comments

4reactions

dwallace0723commented, Nov 11, 2020

@jtcohen6 just here to give a big 👍 to the idea of modification subselectors. Specifically, state:modified.contents. We have such a heavy reliance on environment variables in our dbt project that using state:modified is effectively a non-starter for us right now. Would love to be able to use it in the future though!

0reactions

ncolomercommented, Feb 24, 2021

This definitely looks promising!

Up to now, current state:modified feature partially help us because, for some of our dbt models, we rely a lot on variables and jinja templating. Today, this forces us to always redeploy (ie. --full-refresh) those models. And some can be costly because they materialize data.

The state:modified.macros and state:modified.vars subselectors (that would be included by default in state:modified) would a great addition to solve this problem.

Top Results From Across the Web

Subselectors for state:modified · Issue #2704 · dbt-labs/dbt-core

So we're giving them the option to turn off some knobs, in the form of more-specific subselectors. Subselectors. There is potential for overlap: ......

Methods | dbt Developer Hub

state :modified : All new nodes, plus any changes to existing nodes. $ dbt test --select state:new # run all tests on new...

Set CI/CD in dbt to test only modified models and new tests

I'm new to dbt, and been searching for this answer without success: how can we configure CI/CD to only test for modifications in...

5.7. Selectors — Sherpa Manual 3.0.0 documentation

Some selectors modify the momenta and flavours of the set of final state ... a list of subselectors which then act on the...

dbt Release v0.21.0. Extract, Learn, Teach is a series…

Once that is done, any sql file containing config blocks should be modified to ... node selector state:modified as a new sub-selectors.