question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Give dbt basic workflow capabilities

See original GitHub issue

Describe the feature

We need a way to more flexibly call different chains of dbt commands in different orders and schedules for a single dbt project while capturing the steps of these workflows along with the parameters, relationships between steps, etc. It would also be helpful to be able to store the configuration for these workflows and steps in the dbt project itself, so changes to the configuration for the orchestration can be versioned, controlled, managed, and deployed using git-based tools in the same way that everything else in dbt is.

Describe alternatives you’ve considered

We have been using bash scripts and Docker to capture this along with other enterprise workflow management software. We have also seen other dbt users use Airflow, Luigi, etc. All of these add significant overhead and complexity.

Additional context

Should not be database-specific.

We do need a way to more flexibly call different chains of dbt commands in potentially a different order for any given dbt project. It would be helpful for developers on a given dbt project to be able to clearly see in git/AZDO somehow the given chains of dbt commands for any given dbt project. And also control/review/update/test these chains of commands using the same CI/CD process that we use for dbt models, macros, and tests. There are scenarios where on some projects we might want to do something like this, and the chain of commands, models, tests, and selectors can affect the logic of how the developer is writing additional models and tests so they need to really understand the flow of what is going on for any given project at any given time. Example chain of commands:

dbt clean
dbt deps
dbt run-operation {some-macro} --args {arg1}
dbt run-operation {some-other-macro} --args {arg2}
dbt seed
dbt source snapshot-freshness
dbt test --models source:*
dbt run --models tag:hourly
dbt test

It’s likely that over time each project will have its own divergent set of dbt commands, tags, parameters, etc. We also need a way to be able to call different dbt commands on different schedules. Most common case for this is being able to call dbt snapshot (along with perhaps a few tests, etc.) more often than other dbt commands. It would also be helpful to perhaps call dbt seed less often, even only on detecting that there has been a change in a seed file (although it’s a pretty low-cost operation.)

Who will this benefit?

Developers and analytics users who will be able to clearly see the dbt workflow job chains and parameters right alongside their dbt models and code. Architects who can then worry less about having to build up other job orchestration infrastructure because dbt does not have these capabilities built in.

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Reactions:7
  • Comments:6 (3 by maintainers)

github_iconTop GitHub Comments

2reactions
marshalljonjcommented, Feb 11, 2020

Hi, it would be useful to have a way to force the -m flag to be used when using dbt run - we don’t want any developer to be able to accidentally or potentially force all of our models to redeploy, as this would a) cause a disruption to service b) be computationally expensive especially as we used materialised tables in some our models So forcing a list of models to be run would help mitigate against this issue.

0reactions
github-actions[bot]commented, Dec 14, 2021

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please remove the stale label or comment on the issue, or it will be closed in 7 days.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Give dbt basic workflow capabilities · Issue #1842 - GitHub
Describe the feature We need a way to more flexibly call different chains of dbt commands in different orders and schedules for a...
Read more >
dbt in a Predictive Modeling Workflow
In dbt, the machine learning workflow simply involves selecting a subset of observation IDs within the same table, which continues to grow over...
Read more >
What is dbt? | dbt Developer Hub
dbt is a transformation workflow that helps you get more work done while producing higher quality results. You can use dbt to modularize...
Read more >
Bringing Analysts into the Data Transformation Workflow - DBT
dbt is a development framework that combines modular SQL with software engineering best practices to make data transformation reliable, fast, ...
Read more >
A Git Workflow for Data Teams - DBT
The basic Git flow #. This flow assumes that you're working on a remote repository that lives on GitHub, GitLab, Bitbucket, ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found