Allow access to context in VersionStrategy
See original GitHub issueFirst of all, I’d like to thank you for that amazing work. Given all the great features, we plan on using dagster as our main orchestration lib soon.
I have been testing the new versioning and memoization recently, and have stumbled on how to use it in nodes designed as factories (more details below). Notice that I am a recent user, and as such I am not entirely sure if this requires a new feature or if this results of me not using the lib correctly.
Use Case
We are mainly implementing nodes as factories. For example, we have a node make_dataset, which takes a string in config to return either dataset_A of dataset_B. In pseudo_code:
@solid(version=???)
def make_dataset(context):
name = context.solid_config["name"]
return get_dataset(name)
run_config = {
"solids": {
"make_dataset": {
"config": {"name": "dataset_A"}
}
}
}
Now, I want to implement versioning of this node, in order to use the memoization feature.
-
I can implement a function like
get_dataset_version(dataset_name:str)->str
(for example using DVC) but I don’t know how to use it as the dataset name will be known only at runtime. -
The other option would be to have only one global version (like the agglomeration the DVC versions of all the datasets). However, this seems awkard to me, as it would change the signature of the whole node if we add or update a dataset.
Currently, the version
field of the solid decorator expects just a string, and VersionStrategy only has access to the solid definition, which does not have access to the run time context. I am not aware of other ways to add version informations in a solid.
Of course, I could have a different (versionned) node for each dataset and produce the graph dynamically. However, this will break dagit, (and possibly a bunch of other features?).
Ideas of Implementation
The signatures of VersionStrategy.get_solid_version
and VersionStrategy.get_resource_version
could be modified to have access to a solid context, and the code in resolve_versions.py
(l. 100+) would have to be adapted. This seems like a small modification, but I am not aware of what it could break.
Additional Info
An alternative would be to have dagster manage pipeline templates, with placeholder nodes. Although it could be a big step toward managing fully dynamical DAG, this certainely seems like a way bigger change. What do you think ?
Message from the maintainers:
Excited about this feature? Give it a 👍. We factor engagement into prioritization.
Issue Analytics
- State:
- Created 2 years ago
- Comments:10 (10 by maintainers)
FYI @dpeng817
Sorry for the delay, I finally made that PR. I am waiting for your feedbacks.