Improve syntax for op order dependencies
See original GitHub issueAssume we have solid_a
and solid_b
. If we want solid_b
to execute after solid_a
, we can composition functions to express the ordering:
@pipeline
def pipe():
solid_b(solid_a)
However, if solid_b
doesn’t depend on solid_a
’s outputs, we need to define solid_a
’s input_defs
as something like:
input_defs=[InputDefinition(_START, Nothing)]
We could probably come up with a way to express order dependencies in the composition function syntax, or at least create a simple alias for InputDefinition(_START, Nothing)
to make it easier to understand.
Issue Analytics
- State:
- Created 4 years ago
- Reactions:10
- Comments:15 (14 by maintainers)
Top Results From Across the Web
Define the order for deploying resources in ARM templates
Describes how to set one Azure resource as dependent on another resource during deployment. The dependencies ensure resources are deployed ...
Read more >Improving Relation Extraction through Syntax-induced Pre ...
Relation extraction (RE) is an important natu- ral language processing task that predicts the relation between two given entities, where a.
Read more >7. Declaring relationships between packages - Debian
Declaring relationships between packages¶. 7.1. Syntax of relationship fields¶. These fields all have a uniform syntax. They are a list of package names ......
Read more >A Beginner's Guide to the True Order of SQL Operations
“poor Java guy” – you really think that Java syntax is much better? :) Why should WHERE+HAVING be against the SQL “idea” (i.e....
Read more >Create Resource Dependencies | Terraform
Create an implicit dependency between an EC2 instance and its Elastic IP ... Terraform provisions your resources in order, and reports on its...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
I’ve run into some implementation challenges related to this issue – spoke with some of your team in Slack and was encouraged to provide my use case as an example here.
So I’m building an ELT pipeline, and as a convention have set up my pipeline to have three stages: extract, load, transform. I’m using
Nothing
passed between them to create the dependency structure. My pipeline looks like this:In general, “extract” means extract data from original source into an S3 staging bucket, “load” means copy data from S3 staging bucket into Redshift within my “raw data” schema, and “transform” means move data from “raw data” schema into “production ready” schema while performing necessary cleaning, filtering, transforms along the way. I’m primarily using dbt models via
dagster-dbt
in the transform stage.Each of the three solids - extract, load, transform - are defined as a
composite_solid
, allowing me to break these stages down into child solids within each. (I’ve established these conventions for consistency because I’ll eventually be building more pipelines for different data sources/other developers will be building pipelines etc)I didn’t want to have to map the
Nothing
input into my child solids because many of my child solids are reusable utils, and I didn’t want to have to change the solid input definition just to mapNothing
for this specific use case. So, as a workaround I wrote this solid to map myNothing
to within eachcomposite_solid
:My three stages basically look like this:
The challenge that I’m still running into is that because I haven’t mapped the
Nothing
input to my child solids, the dependency structure is only enforced for thedo_nothing
solids, but isn’t enforced for the child solids that actually do stuff (i.e. my dbt model intransform
starts executing beforeload
has finished copying data). It sounds like I could go back and mapNothing
into my custom util child solids to enforce the dependency structure, however, in the case of my dbt models I’m not sure what to do because when using thecreate_dbt_run_solid
fromdagster-dbt
I can’t change the solid definition in order to map theNothing
input.I’m still learning Dagster so please let me know if my understanding of things is incorrect/incomplete – there may be simple solutions that I’ve missed, and I may be abusing the system a bit. If that is the case please let me know how you guys would suggest to resolve these issues.
In general though, as a user, I would ideally like a way to establish my dependency structure between my composite solids, and be guaranteed that the structure will be enforced across the child solids without having to do this extra layer of
Nothing
mapping to child solids. Additionally, mydo_nothing
solid feels hacky and long term I’d love a way to do away with that.I can’t speak to the implementation challenges in your system, but as a user I think something more semantic than passing
Nothing
for establishing dependency structures would definitely be welcome. In my specific use case, it seems like the notion of my “extract” stage being “done” or my “load” stage being “done” really corresponds to “some asset exists in S3” or “some table has been populated in Redshift” etc and these are the real notions that should be used to establish dependencies/preconditions. Just thinking out loud now. Hope my example is helpful in brainstorming solutions and please let me know if there is anything I can do to help. Have really enjoyed using Dagster thus far!