Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

How to output Artifacts (Model, Metrics, Dataset, etc.) without using Python-based component?

See original GitHub issue

Using v2 SDK and Vertex Pipelines environment, is it possible to create a reusable component (i.e. manually write a component.yaml file) that consumes and/or generates the new Artifact types such as Model, Metrics, Dataset, etc.?

My understanding of these Artifact types is that they are a value/path/reference along with associated metadata. When passing or consuming these in a non-Python-based component, I can only reference or generate an Artifact’s path and nothing else it seems.

For example, in the v1 SDK, it was possible to generate metrics that could be visualized by just by dumping a JSON object to the given output path. This allowed the possibility of using non-Python-based components to generate metrics and other metadata.

Is such a thing possible in v2/Vertex Pipelines? If not, is it on the roadmap or is the recommendation to port all components to lightweight Python components?

Issue Analytics

State:
Created 2 years ago
Reactions:7
Comments:13 (3 by maintainers)

Top GitHub Comments

8reactions

jordyantunescommented, Mar 4, 2022

I found a somewhat hacky solution to this problem. I’m using Kubeflow’s Executor class (which is the one used by function-based components) to easily instantiate the Artifact objects. I could iterate through executor_input and create all the objects myself, but I think it’s a lot more convenient to use Executor, even if I’m not using it for what is was designed.

You need to include {executorInput: null} in your component.yaml file and your python script would look something like this:

from kfp.v2.components.executor import Executor
from kfp.v2.dsl import Metrics, Model
import argparse
import json


parser = argparse.ArgumentParser()
parser.add_argument("--executor-input", type=str, required=True)

args = parser.parse_args()

# carrega argumentos do executor
executor_input = json.loads(args.executor_input)

# carrega configuracoes
executor = Executor(executor_input, lambda x: x)

# obtem objetos Kubeflow
metrics:Metrics = executor._output_artifacts['metrics']
model:Model = executor._output_artifacts['model']

# log de metricas
metrics.log_metric("accuracy", 0.9)

# salva modelo
with open(model.path, "w") as f:
    f.write("data")

# salva saidas
executor._write_executor_output()

I’m also attaching all the files necessary to run this example, as well as some screenshots to show you that it works (at least on Vertex AI pipelines). Just so we don’t have to build and publish a docker image, I included the python script in the component.yaml file.

Code:

code.zip

Screenshots:

Edit: after commenting I realized what I did was kind of what was suggested in https://github.com/kubeflow/pipelines/issues/6116#issuecomment-885506281 . So I just wanted to give them credits.

3reactions

parthmishracommented, Jul 23, 2021

@chensun

Thanks for the explanation, I think the v2 SDK docs for “regular” component building should state that these Artifact types are not useable and that users wishing to implement these inputs/outputs should instead write them using Python-function based components. The current docs are misleading in this regard and make it seem like there is equal feature parity between the two methods of implementing Components.

Top Results From Across the Web

Component I/O | Kubeflow

Use parameter/artifact inputs and outputs. ... dsl from kfp.dsl import Dataset, Input, Model, Output @dsl.component def train_model(dataset: Input[Dataset], ...

Demystifying TFX Standard Components · All things

Strongly typed definitions of artifacts (trained models, datasets, or other objects) and their properties; Execution records of component ...

Track Your ML models as a Pro, Track them with MLflow.

Well, once the above is done, it is time to start tracking artifacts, metrics and parameters. Let's see how we do with 3...

Machine Learning model workflow and tracking using MLflow

The MLflow Tracking component is an API and UI for logging parameters, code versions, metrics, and output files when running machine ...

Tracker — sagemaker-experiments 0.1.42 documentation

Note that parameters and input/output artifacts are saved to SageMaker ... As a result any metrics logged in non-training job host environments will...