Review UX for --defer
See original GitHub issueWe’re making a subtle-yet-significant change to deferral in v0.19 (#2946, #2954), which is already a complex and under-appreciated feature. It’s well and good to document those changes, but there are also things we can do within this codebase to make the feature more intuitive for users.
Naming
Should we still call this defer
?
Here’s what we’re trying to get across:
- You don’t need to re-do work if you don’t have to. You can stand on the shoulders of giants, i.e. your own past self.
- The flexibility of “failing over” onto a backup set. Missing a resource you need? No worries, let’s just grab it from somewhere we know it already exists.
- Cloning without actually cloning. We all get read-only versions of the same data, and I only need to run the pieces I’m tinkering with.
What’s good about defer
?
- I like the semantic affinity between
defer
(deference) andref
(reference). Any closer and it’d be really confusing (imagine calling this--refer
) - “defer to” is really the sense we’re after: I don’t have all the tools for a full answer, so I’ll defer to the experts (=production), defer to tradition (=past run). We could do a simple, more explicit rename:
--defer-to-state
- One of my favorite high school English teachers used the phrase “a pleasure deferred” when describing, with gentle sarcasm, an unpleasant task that was too time-consuming for the current moment. (“Ah, yes, I was going to have Jeremy restate the major points of the Grand Inquisitor, but with scarce minutes remaining, I suppose it will have to be a pleasure deferred.”) I think of Mr. Ward often, as I breeze past long-running parent models in my dev runs.
What’s bad about defer
?
- kind of a filler word, could mean anything
- carries a temporal aspect (“do it later”), rather than a spatial one (“do it over here instead”)
Alternative metaphors:
- backup, safety net, entrust, trust fall
- shortcut, jump ahead, starting over from a save point in a platformer level
- look upstream, switch upstream
- borrow, mooch off of
- pinch hitting (player’s out? put me in coach)
- in loco parentis (literally true)
- pulling off a tablecloth without stuff breaking
Any of those do anything for anyone?
Developer experience
How can we make it clear to users which models/resources have been deferred, and which haven’t?
@jtcohen6: We currently have a debug log that lists the number of resources being deferred, and a sample of up to 5 (though the current wording is Merged {x} items from state
). We could log that to stdout instead. We could also take it one step further, and determine which deferred resources are actually relevant to (upstream of) the models or tests being run. That would take some extra work, but it feels worthwhile:
$ dbt run -m model_a model_b
Running with dbt=0.19.0
Found 7 models, 4 tests, 1 snapshot, 0 analyses, 138 macros, 0 operations, 4 seed files, 1 source
DEFERRED 1 upstream relation: analytics.model_a
21:52:09 | Concurrency: 1 threads (target='dev')
21:52:09 |
21:52:09 | 1 of 1 START view model dbt_jcohen.model_b........................... [RUN]
21:52:09 | 1 of 1 OK created view model dbt_jcohen.model_b...................... [CREATE VIEW in 0.06s]
21:52:09 |
21:52:09 | Finished running 1 view model in 0.32s.
@drewbanin: I’m almost picturing that those nodes show up in the stdout logs as though we were running them, but they have a status like DEFERRED
or UPSTREAM
:
$ dbt run -m model_a model_b
Running with dbt=0.19.0
Found 7 models, 4 tests, 1 snapshot, 0 analyses, 138 macros, 0 operations, 4 seed files, 1 source
21:52:09 | Concurrency: 1 threads (target='dev')
21:52:09 |
21:52:09 | 1 of 2 SKIP relation dbt_jcohen.model_a.............................. [DEFERRED]
21:52:09 | 2 of 2 START view model dbt_jcohen.model_b........................... [RUN]
21:52:09 | 2 of 2 OK created view model dbt_jcohen.model_b...................... [CREATE VIEW in 0.06s]
21:52:09 |
21:52:09 | Finished running 1 view model in 0.32s.
We do store deferred
as a node attribute in the manifest
. Is there value in surfacing that information in the docs site? It’s already sort of there, implicit in each model’s compiled SQL, since deferred nodes will have rendered their references into a different namespace.
Issue Analytics
- State:
- Created 3 years ago
- Reactions:3
- Comments:6 (5 by maintainers)
Top GitHub Comments
Hello - this is a great issue thread, I love the engagement with the community on opinions re naming. Also, @jtcohen6 “a pleasure deferred” got a chuckle from me - also a helpful reminder I need to read more Dostoevsky - however I keep putting this off, a pleasure deferred!
On naming On our go implementation of the dbt cli, we have the following:
We also built an in-house version of this to extend the dbt cli prior to the
defer
feature existing. Referred to in this blog. We invoke a model run, for example, we can read from ‘prod’ data, by usingdbt monzo upstream prod -m modelA
. ‘prod’ can be any target, so you could point it to another developers dev area if you were collaborating on different PRs. Theo, who built this a few years back, posted on the dbt community page more details on how it works.Here’s the CLI print out from our
upstream
In short - my vote would be for
–upstream
with a nice short-u
option too.On behaviour I wonder whether this should be the default behaviour? Or at least configurable to be the default behaviour at the target level.
When I was owning pipelines, I found myself most of the time using upstream=prod. And at a previous data platform company I worked at, the default behaviour was to read from production data as data developers in most cases want to know what their pipeline modifications will look like production.
Just my 2 cents on naming: I’m working on an in-house dbt deployment tool that basically wraps
dbt run
anddbt test
with a bunch of options to make build and deployment workflows a bit easier. I’m wrapping the idea of using a different upstream source simply in an--upstream-source
or-u
parameter. You just supply the name of the target you want to use as your upstream source and we handle generation of manifests if needed and--defer
and--state
syntax for you.