Generalized `dbt build` command
See original GitHub issueSee also: #1054, #1227, #2234, this comment
Describe the feature
Each dbt node-resource type has a task-command associated with it:
- models =
dbt run
- tests =
dbt test
- seeds =
dbt seed
- snapshots =
dbt snapshot
- sources =
dbt source snapshot-freshness
Additionally, there could be a generalized command dbt build
1 that would step through a DAG of multiple resource types and “build” them accordingly.
What would this look like? I imagine an argument syntax similar to dbt ls
, i.e.
dbt build --select ... --exclude ... --resource-type ...
1 name subject to change, though for the ultimate command of the data build tool, it’d be hard to think of one more apropos…
Example
Let’s imagine we had model_a
that depends on a source (my_source.table
) and a seed (my_seed
), a snapshot (my_snapshot
) of model_a
, and then model_b
which selected from my_snapshot
. Of course, we also have tests on many of them. Roughly:
my_source.table --> my_seed --> model_a --> my_snapshot --> model_b
Within a single invocation, dbt build
would go through motions analogous to running the following dbt commands. It would only proceed to the next numerical steps if all upstream steps succeed:
1a. dbt seed my_seed
1b. dbt source snapshot-freshness --select my_source.table
2a. dbt test --models my_seed
2b. dbt test --models source:my_source.table
3. dbt run --models model_a
4. dbt test --models model_a
5. dbt snapshot --select my_snapshot
6. dbt test --models my_snapshot
7. dbt run --models model_b
8. dbt test --models model_b
Complexities
- Some of these tasks are already DAG aware (
run
,test
,snapshot
), some are not (seed
,snapshot-freshness
) - Commands support several different flags
- How to expose when a flag is being used, and when it isn’t?
- What about same-named flags that do subtly different things across commands? e.g.
dbt run --full-refresh
vs.dbt seed --full-refresh
- Node types are just about 1:1 with task types, though
dbt test
almost feels like an exception. Technically,dbt test
operations on test nodes, but other node types can be passed into its selection syntax, with selector expansion as the last step, so it “feels” like you’re testing a model or a snapshot. (Edit: this behavior may someday change.) - This risks a lot of our existing intuitions that come from having resource types nicely delineated. Put differently: what if it all just falls apart?
- What if it works so well that 90% of dbt deployments are just
dbt build
? Should we be weary of creating one command to rule them all?
- What if it works so well that 90% of dbt deployments are just
Describe alternatives you’ve considered
- Doing a more particularized version of this, e.g.
dbt run+test
(as outlined in linked issues) - Not doing this at all, and leaving the federation of one resource type = one command/invocation. Is this a good abstraction that we should fight to keep?
Who will this benefit?
- Bigger, more complex projects who want to run subsets of different resource types. Today, that can only be accomplished through complex selection syntax leveraging tags. YAML selectors improves this somewhat, but they’re not the answer.
- Projects with snapshots that participate in the middle of the DAG
- Deployments that want to test upstream models before running downstream models, so as to alert earlier and save compute time/$$ in the event of failure
Issue Analytics
- State:
- Created 3 years ago
- Reactions:10
- Comments:13 (8 by maintainers)
@jtcohen6 I have been stuck on this idea that I just cannot shake! Wanted to mention it here.
IF:
dbt source snapshot-freshness
ANDTHEN:
dbt build
command would be well-positioned to skip running models where a rebuild would result in exactly the same database object that already exists in the databaseI think there’s some more formality / rigor to apply here, and I’m actually not 100% sure that this requires the existence of a
dbt build
command, but wanted to throw it out there for consideration.To get more concrete, here are some of the examples I’m considering: A view model only need to be built when:
A table/incremental model only needs to be built when:
I think that we can get at a lot of this stuff with the
state:
orconfig.materialized
selectors, so really my thinking boils down to:dbt build
command?@kosti-hokkanen-supermetrics Cool to hear what you’re hoping to do with it! The first cut of
dbt build
won’t allow much configuration, and its behavior will be defined by some opinionated rules, including:That said, I believe all the right constructs are there. I bet you can combine several
dbt build
invocations, paired with test severity and thoughtful node selection, to accomplish the thing you’re after.