question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Subselectors for state:modified

See original GitHub issue

This is a follow-on from the initial proposals in #2465, #2641. Most of the required work is around exposing + sugaring the foundational work in #2695.

The way state comparison works is by identifying discrepancies between two manifests. When comparing between a past prod manifest and the current development manifest, discrepancies can be the result of two things:

  1. Changes made to the project in development
  2. Env-aware logic that causes different behavior based on the target, env vars, etc.

We’re going to do our best to capture only changes that are the result of development. If someone’s project has tons of intricate env-aware logic, they’ll run more models than they want (i.e. more false positives). So we’re giving them the option to turn off some knobs, in the form of more-specific subselectors.

Subselectors

There is potential for overlap: a single change can trigger multiple modification categories.

state:modified.contents:

  • Models: raw file contents changed
  • Snapshots: raw file contents changed
  • Data tests: raw file contents changed
  • Analyses: raw file contents changed
  • Seeds: raw file contents changed
    • However, if the file is >1 MB, we cannot compare raw file contents, so we raise a warning and just compare based on file path instead.

This alone would get a lot of people what they want! It’s basically “just hash the files,” excluding YAML configuration.

state:modified.configs:

  • Models: changes to materialized, quoting, bind, transient, sort/dist, partition_by, incremental_strategy…
    • This category captures changes in dbt_project.yml or {{config()}} blocks. If the changes are made in a {{config()}} block, they will also be picked up as content changes.
    • If someone has env-aware logic for materialized, where a model is a view in dev and a table in prod, they will not want to include this.
  • Snapshots: unique_key, strategy, …
  • Seeds: quoting, column_types
  • Schema tests: severity changes

state:modified.descriptions:

  • If persist_docs is turned on for a node, description changes count as modifications. (If just columns, just column descriptions; just relations, top-level descriptions; if both, then both.)

state:modified.database_representations:

  • Models: changes to the configured database, schema, identifier. This value represents the manual input only, and it’s different from the resolved database representation, which depends on the target and generate_x_name macros.
    • If someone manually sets schema = target.schema, or schema = target.schema ~ '_suffix' instead of using the generate_schema_name macro, that will register as a change between environments and they’ll want to turn this off.
    • Depending on the generate_x_name logic and the current environment, a chance to the configured value may not actually change the database representation. We’ll still register it as a modification.
  • Seeds: treated the same as models.
  • Sources: database, schema, or identifier has changed. If someone has env-aware definitions, they’ll want to turn this off.
  • Snapshots: treated the same as sources.

Default behavior

I think state:modified should include all changes from all the categories above. The question mark is whether database_representations should be included in the default, since this is the area where people do the most custom things, and it’s the knob that will likely be switched off most often. For the sake of clarity, I think it’s best to have the state:modified selector be a superset of all modified subselectors.

Future art

state:modified.macros:

  • A macro’s raw contents have changed
  • By extension, state:modified.macros+ would include all downstream models, tests, etc. that call (directly or indirectly) a macro that has changed
  • This also includes implicit macro dependencies such as generate_schema_name

state:modified.vars:

  • A var value has changed
  • By extension, state:modified.vars+ would include all downstream models, tests, etc. that call (directly or indirectly) a var that has changed

We will update state:modified to include both of these as well.

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Reactions:5
  • Comments:11 (7 by maintainers)

github_iconTop GitHub Comments

4reactions
dwallace0723commented, Nov 11, 2020

@jtcohen6 just here to give a big 👍 to the idea of modification subselectors. Specifically, state:modified.contents. We have such a heavy reliance on environment variables in our dbt project that using state:modified is effectively a non-starter for us right now. Would love to be able to use it in the future though!

0reactions
ncolomercommented, Feb 24, 2021

This definitely looks promising!

Up to now, current state:modified feature partially help us because, for some of our dbt models, we rely a lot on variables and jinja templating. Today, this forces us to always redeploy (ie. --full-refresh) those models. And some can be costly because they materialize data.

The state:modified.macros and state:modified.vars subselectors (that would be included by default in state:modified) would a great addition to solve this problem.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Subselectors for state:modified · Issue #2704 · dbt-labs/dbt-core
So we're giving them the option to turn off some knobs, in the form of more-specific subselectors. Subselectors. There is potential for overlap: ......
Read more >
Methods | dbt Developer Hub
state :modified : All new nodes, plus any changes to existing nodes. $ dbt test --select state:new # run all tests on new...
Read more >
Set CI/CD in dbt to test only modified models and new tests
I'm new to dbt, and been searching for this answer without success: how can we configure CI/CD to only test for modifications in...
Read more >
5.7. Selectors — Sherpa Manual 3.0.0 documentation
Some selectors modify the momenta and flavours of the set of final state ... a list of subselectors which then act on the...
Read more >
dbt Release v0.21.0. Extract, Learn, Teach is a series…
Once that is done, any sql file containing config blocks should be modified to ... node selector state:modified as a new sub-selectors.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found