Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

New node type: reports

See original GitHub issue

**Edit: I previously called these exposures. I’ve changed it to report to keep it more tangible for this first version.

dbt Core needs to have a semantic understanding of a report. A “report” is a generalization of a “dashboard”, though it could also be an ML model, Jupyter notebook, R Shiny app, in-app data viz, etc.

The report node serves two purposes:

Define a set of models as dependencies
Define metadata to populate an embed tile and special dbt-docs “landing page”

Why?

We want to be able to push information about data quality and lineage from dbt into external tools. Exposures should be the Core side of that contract.
The same reason we added sources: there’s a big piece missing from the DAG today. Once downstream uses of dbt models are registered as reports, dbt project maintainers can start asking questions like:
- Which models are our most critical dependencies?
- Which final models are only used by exposures with very few/irregular views?
- Which final models aren’t used at all?

Tentative spec

Edit: updated based on comments below

models/anything.yml

version: 2

exposures:
  - name: orders_dash

    type: dashboard # one of: {dashboard, notebook, analysis, ml, application}
    url: https://fishtown.looker.com/dashboards/1
    maturity: high # i.e. importance/usage/QA/SLA/etc. one of: {low, medium, high}

    description: >

      This is a dashboard of monthly orders, with customer
      attributes as context.

    depends_on:
      - ref('fct_orders')
      - ref('dim_customer')
      - source('marketplace', 'currency_conversions')

    owner:
      email: jeremy@fishtownanalytics.com # required
      name: "Jeremy from FP&A" # optional

Core functionality

By way of depends_on, I’d expect dbt run -m +report:orders_dash to run all upstream models, and dbt test -m +report:orders_dash to test all upstream nodes. dbt run -m +report:* would run all models upstream of all reports. An report cannot itself be run or have tests defined.

Edit: I changed this syntax to feel more like the source: model selector method. Rationale: dbt run -m orders_dash has no effect; it’s worth calling out that this is a special thing.

Open questions

What should be the docs “landing page” for a report? I’ll write up a related dbt-docs issue
Crucially, we’ll need a mechanism that can parse a report’s depends_on from manifest.json, compile the list of all upstream nodes, and then search in run_results.json and sources.json for all associated tests and freshness checks. Where should that mechanism live, exactly?
owner: There is added benefit (and constraint) to tying this to a dbt Cloud user ID. Should we try to make a mapping via email only, instead?
type + maturity: Should we constrain the set of options here? Offer them as free text fields? I like the eventual idea of summarizing things like:

fct_orders is directly exposed in:
3 dashboards, of varying maturity (high: 2, medium: 1)
1 low-maturity ML pipeline
2 medium-maturity apps

fct_orders indirectly powers:
2 medium-maturity dashboards

Maybe that’s in dbt-docs, maybe that’s an ls command, maybe it’s a pipe dream. This is the piece that feels least critical for the first version.

@drewbanin to help suss out some answers

Future work (not v1)

Can reports declare non-model nodes in their depends_on? (Could reports depend on other reports?)
Can reports modify expectations of upstream tests? I could imagine overriding a test severity, or defining a set of tests to exclude from consideration.
Can reports be tagged?
Can they accept arbitrary additional fields (i.e. meta)?
Can we closely tie owner to a dbt Cloud user by email? Purpose: configurable notification channels, asset rendering in dbt-docs

Issue Analytics

State:
Created 3 years ago
Reactions:4
Comments:7 (5 by maintainers)

Top GitHub Comments

3reactions

jtcohen6commented, Sep 16, 2020

@bashyroger I agree with a lot of what you said above. If you’ll humor me, I think this is a yes/and rather than an either/or.

dbt could, and someday should, pull information from database query logs to get a sense of how models are being used in production. There would be tremendous benefit in identifying unused final models that are good candidates for deprecation, or shaky models that serve as cornerstones to crucial reports yet lack the commensurate testing. This tooling would be immensely valuable to dbt developers, and I completely agree that to such an end there should be a full decoupling between dbt resources and our ability to see how they’re queried in the wild. dbt maintainers should see usage information for dashboards they know about and (especially) dashboards they don’t.

The matrix of tools here is a challenge, but not an insurmountable one. It sounds like you’ve made some significant progress on this for BigQuery + Looker. Any chance there are code snippets or approaches you’d be interesting in contributing back? 😄

At the same time, there’s a lot of useful information already stored in dbt artifacts today—source freshness, test success—that we can and should put in front of data consumers and other downstream beneficiaries of dbt models. We think that means embedding it in the BI/application/usage layer. This is a subtle but significant distinction IMO:

The target audience is data consumers, not dbt developers. To be honest, this is a much broader audience, and it’s a role I have a harder time empathizing with, since I’ve never had it. We’re especially open to feedback here, and will likely be looking for beta testers…
The goal here is not to enumerate an exhaustive registry of all downstream usage, or to keep up with every ad hoc report/lookml model/cognos framework/etc; I agree that’s an impossible task. Rather, we want to give dbt maintainers a “seal of approval” that they can put on trusted reports. We’re making a claim here that the most impactful, established, reliable dashboards/notebooks/apps/analyses should offer their viewers a set of expectations about data quality, a status check on those expectations, a link back to a specialized landing page in dbt-docs, and an action step (contacting the owner) if those expectations are unmet.

To my mind, this roughly maps to the distinction, mentioned in Tristan’s and Jia’s “Are dashboards dead?” conversation, between ad hoc exploration on the one hand + dashboards with service-level guarantees on the other. We may see different, more precise tooling emerge to support each use case, which I really do believe to be quite different. Ultimately, I think dbt developers will want to know about both categories—in-the-wild querying and sanctioned usage—and we’ll want to find compelling ways to integrate that information in the long run.

3reactions

beckjakecommented, Aug 28, 2020

if you want to support refables and sources (which I think is a good idea!), what about just using ref and source?

    depends_on:
      - ref('fct_orders')
      - ref('dim_customer')
      - source('raw', 'countrycodes')

Users should at least some level of comfortable with the behavior of ref/source already.

Allowing sources will probably be nice for testing, too (zero database operations required).

Top Results From Across the Web

New node type: reports · Issue #2730 · dbt-labs/dbt-core · GitHub

The report node serves two purposes: Define a set of models as dependencies; Define metadata to populate an embed tile and special dbt-docs ......

Creating a Node Type - Oracle Help Center

It's a best practice to create a new node type when you want to: ... Then define the hierarchy set with Company node...

Node.nodeType - Web APIs | MDN

The read-only nodeType property of a Node interface is an integer that identifies what the node is. It distinguishes different kind of nodes ......

New node type: check for new data - n8n community

Check if a record value is larger than the last one (for auto-increment IDs, timestamps etc). The node could have two outputs: New...

[GA4] Path exploration - Analytics Help - Google Support

You set the node type for the starting point when you create a new path ... The users flow and behavior flow reports...

Troubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.

Start Free

Top Related Reddit Thread

No results found

Top Related Tweet

No results found

Top Related Dev.to Post

No results found

New node type: reports

Why?

Tentative spec

Core functionality

Open questions

Future work (not v1)

Issue Analytics

Top GitHub Comments

Top Results From Across the Web

Top Related Medium Post

Top Related StackOverflow Question

Troubleshoot Live Code

Top Related Reddit Thread

Top Related Hackernoon Post

Top Related Tweet

Top Related Dev.to Post

Top Related Hashnode Post

Cannot bring up postgresql when running make test-unit

Support temp tables in Snowflake "table" materializations