Give dbt basic workflow capabilities
See original GitHub issueDescribe the feature
We need a way to more flexibly call different chains of dbt commands in different orders and schedules for a single dbt project while capturing the steps of these workflows along with the parameters, relationships between steps, etc. It would also be helpful to be able to store the configuration for these workflows and steps in the dbt project itself, so changes to the configuration for the orchestration can be versioned, controlled, managed, and deployed using git-based tools in the same way that everything else in dbt is.
Describe alternatives you’ve considered
We have been using bash scripts and Docker to capture this along with other enterprise workflow management software. We have also seen other dbt users use Airflow, Luigi, etc. All of these add significant overhead and complexity.
Additional context
Should not be database-specific.
We do need a way to more flexibly call different chains of dbt commands in potentially a different order for any given dbt project. It would be helpful for developers on a given dbt project to be able to clearly see in git/AZDO somehow the given chains of dbt commands for any given dbt project. And also control/review/update/test these chains of commands using the same CI/CD process that we use for dbt models, macros, and tests. There are scenarios where on some projects we might want to do something like this, and the chain of commands, models, tests, and selectors can affect the logic of how the developer is writing additional models and tests so they need to really understand the flow of what is going on for any given project at any given time. Example chain of commands:
dbt clean
dbt deps
dbt run-operation {some-macro} --args {arg1}
dbt run-operation {some-other-macro} --args {arg2}
dbt seed
dbt source snapshot-freshness
dbt test --models source:*
dbt run --models tag:hourly
dbt test
It’s likely that over time each project will have its own divergent set of dbt commands, tags, parameters, etc. We also need a way to be able to call different dbt commands on different schedules. Most common case for this is being able to call dbt snapshot (along with perhaps a few tests, etc.) more often than other dbt commands. It would also be helpful to perhaps call dbt seed less often, even only on detecting that there has been a change in a seed file (although it’s a pretty low-cost operation.)
Who will this benefit?
Developers and analytics users who will be able to clearly see the dbt workflow job chains and parameters right alongside their dbt models and code. Architects who can then worry less about having to build up other job orchestration infrastructure because dbt does not have these capabilities built in.
Issue Analytics
- State:
- Created 4 years ago
- Reactions:7
- Comments:6 (3 by maintainers)
Hi, it would be useful to have a way to force the -m flag to be used when using
dbt run
- we don’t want any developer to be able to accidentally or potentially force all of our models to redeploy, as this would a) cause a disruption to service b) be computationally expensive especially as we used materialised tables in some our models So forcing a list of models to be run would help mitigate against this issue.This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please remove the stale label or comment on the issue, or it will be closed in 7 days.