question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

How to output Artifacts (Model, Metrics, Dataset, etc.) without using Python-based component?

See original GitHub issue

Using v2 SDK and Vertex Pipelines environment, is it possible to create a reusable component (i.e. manually write a component.yaml file) that consumes and/or generates the new Artifact types such as Model, Metrics, Dataset, etc.?

My understanding of these Artifact types is that they are a value/path/reference along with associated metadata. When passing or consuming these in a non-Python-based component, I can only reference or generate an Artifact’s path and nothing else it seems.

For example, in the v1 SDK, it was possible to generate metrics that could be visualized by just by dumping a JSON object to the given output path. This allowed the possibility of using non-Python-based components to generate metrics and other metadata.

Is such a thing possible in v2/Vertex Pipelines? If not, is it on the roadmap or is the recommendation to port all components to lightweight Python components?

Issue Analytics

  • State:open
  • Created 2 years ago
  • Reactions:7
  • Comments:13 (3 by maintainers)

github_iconTop GitHub Comments

8reactions
jordyantunescommented, Mar 4, 2022

I found a somewhat hacky solution to this problem. I’m using Kubeflow’s Executor class (which is the one used by function-based components) to easily instantiate the Artifact objects. I could iterate through executor_input and create all the objects myself, but I think it’s a lot more convenient to use Executor, even if I’m not using it for what is was designed.

You need to include {executorInput: null} in your component.yaml file and your python script would look something like this:

from kfp.v2.components.executor import Executor
from kfp.v2.dsl import Metrics, Model
import argparse
import json


parser = argparse.ArgumentParser()
parser.add_argument("--executor-input", type=str, required=True)

args = parser.parse_args()

# carrega argumentos do executor
executor_input = json.loads(args.executor_input)

# carrega configuracoes
executor = Executor(executor_input, lambda x: x)

# obtem objetos Kubeflow
metrics:Metrics = executor._output_artifacts['metrics']
model:Model = executor._output_artifacts['model']

# log de metricas
metrics.log_metric("accuracy", 0.9)

# salva modelo
with open(model.path, "w") as f:
    f.write("data")

# salva saidas
executor._write_executor_output()

I’m also attaching all the files necessary to run this example, as well as some screenshots to show you that it works (at least on Vertex AI pipelines). Just so we don’t have to build and publish a docker image, I included the python script in the component.yaml file.

Code:

code.zip

Screenshots:

Captura de Tela 2022-03-04 às 10 38 54 Captura de Tela 2022-03-04 às 10 39 07

Edit: after commenting I realized what I did was kind of what was suggested in https://github.com/kubeflow/pipelines/issues/6116#issuecomment-885506281 . So I just wanted to give them credits.

3reactions
parthmishracommented, Jul 23, 2021

@chensun

Thanks for the explanation, I think the v2 SDK docs for “regular” component building should state that these Artifact types are not useable and that users wishing to implement these inputs/outputs should instead write them using Python-function based components. The current docs are misleading in this regard and make it seem like there is equal feature parity between the two methods of implementing Components.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Component I/O | Kubeflow
Use parameter/artifact inputs and outputs. ... dsl from kfp.dsl import Dataset, Input, Model, Output @dsl.component def train_model(dataset: Input[Dataset], ...
Read more >
Demystifying TFX Standard Components · All things
Strongly typed definitions of artifacts (trained models, datasets, or other objects) and their properties; Execution records of component ...
Read more >
Track Your ML models as a Pro, Track them with MLflow.
Well, once the above is done, it is time to start tracking artifacts, metrics and parameters. Let's see how we do with 3...
Read more >
Machine Learning model workflow and tracking using MLflow
The MLflow Tracking component is an API and UI for logging parameters, code versions, metrics, and output files when running machine ...
Read more >
Tracker — sagemaker-experiments 0.1.42 documentation
Note that parameters and input/output artifacts are saved to SageMaker ... As a result any metrics logged in non-training job host environments will...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found