How to output Artifacts (Model, Metrics, Dataset, etc.) without using Python-based component?
See original GitHub issueUsing v2 SDK and Vertex Pipelines environment, is it possible to create a reusable component (i.e. manually write a component.yaml
file) that consumes and/or generates the new Artifact types such as Model, Metrics, Dataset, etc.?
My understanding of these Artifact types is that they are a value/path/reference along with associated metadata. When passing or consuming these in a non-Python-based component, I can only reference or generate an Artifact’s path and nothing else it seems.
For example, in the v1 SDK, it was possible to generate metrics that could be visualized by just by dumping a JSON object to the given output path. This allowed the possibility of using non-Python-based components to generate metrics and other metadata.
Is such a thing possible in v2/Vertex Pipelines? If not, is it on the roadmap or is the recommendation to port all components to lightweight Python components?
Issue Analytics
- State:
- Created 2 years ago
- Reactions:7
- Comments:13 (3 by maintainers)
Top GitHub Comments
I found a somewhat hacky solution to this problem. I’m using Kubeflow’s
Executor
class (which is the one used by function-based components) to easily instantiate the Artifact objects. I could iterate throughexecutor_input
and create all the objects myself, but I think it’s a lot more convenient to useExecutor
, even if I’m not using it for what is was designed.You need to include
{executorInput: null}
in your component.yaml file and your python script would look something like this:I’m also attaching all the files necessary to run this example, as well as some screenshots to show you that it works (at least on Vertex AI pipelines). Just so we don’t have to build and publish a docker image, I included the python script in the component.yaml file.
Code:
code.zip
Screenshots:
Edit: after commenting I realized what I did was kind of what was suggested in https://github.com/kubeflow/pipelines/issues/6116#issuecomment-885506281 . So I just wanted to give them credits.
@chensun
Thanks for the explanation, I think the v2 SDK docs for “regular” component building should state that these Artifact types are not useable and that users wishing to implement these inputs/outputs should instead write them using Python-function based components. The current docs are misleading in this regard and make it seem like there is equal feature parity between the two methods of implementing Components.