question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

V2 Output Artifact Classes and Vertex Pipelines

See original GitHub issue

I am trying to create a vertex pipeline using the kfp SDK v2, I’m not sure if this is a vertex issue or a kfp issue, so forgive me if this is the wrong place for this query.

I have a reusable component in my pipeline from which I want to return a Dataset Artifact.

in the component.yaml I have the output specified:

outputs:
    - name: model_configuration
      description: output dataset describing model configuration
      type: Dataset

and as well in the command of the yaml:

--model_configuration, {outputPath: model_configuration}

Then in the function implementing the components logic, I declare a function parameter for the output like so: output_model_configuration_output: Output[Dataset]

in the Artifact types class (declared here: https://github.com/kubeflow/pipelines/blob/master/sdk/python/kfp/v2/components/types/artifact_types.py) I can see there is a method for setting the path of the Artifact with output_artifact.path('Path/to/fil'), but when I implement this method in my code (output_model_configuration_output.path(f"{output_path}model_configuration.parquet")), I am met with an error:

TypeError: 'NoneType' object is not callable

I tried writing the URI To the artifact object’s uri variable directly like so:

output_model_configuration_output.uri = f"{output_path}model_configuration.parquet"

This didn’t throw an error, but the URI Value of the artifact object displayed in the vertex pipeline was not updated in the UI when the pipeline completed.

In addition, I tried adding some metadata to the artifact in this manner: output_model_configuration_output.metadata['num_rows'] = float(len(model_configuration))

But I don’t see this metadata reflected in the Vertex Pipeline UI When the pipeline run finishes, similar to the updated URI.

Let me know if there is anymore information I can provide, or if their is a more appropriate channel for this query.

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:6

github_iconTop GitHub Comments

2reactions
ml6-liamcommented, Nov 1, 2021

Hi,

I have found a way that works. In the end we used kfp sdk to generate a yaml file based on a @component decorated python function, we then adapted this format for our reusable components. Our component.yaml now looks like this:

name: predict
description: Prepare and create predictions request
implementation:
    container:
      args:
      - --executor_input
      - executorInput: null
      - --function_to_execute
      - predict
      command:
      - python3
      - -m
      - kfp.v2.components.executor_main
      - --component_module_path
      - predict.py
      image: gcr.io/PROJECT_ID/kfp/components/predict:latest
inputs: 
    - name: input_1
      type: String
    - name: intput_2
      type: String
outputs:
    - name: output_1
      type: Dataset
    - name: output_2
      type: Dataset

with this change to the yaml, we can now successfully update the artifacts metadata dictionary, and uri through artifact.path = '/path/to/file'. These updates are displayed in the Vertex UI.

I am still unsure why the component.yaml format specified in the Kubeflow documentation does not work - I think this may be a bug with Vertex Pipelines.

0reactions
chensuncommented, Nov 12, 2021

Also, you might want to take a look at: https://github.com/kubeflow/pipelines/pull/6417#issue-977634071 Which would help you build your reusable components with full v2 features support.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Use Google Cloud Pipeline Components | Vertex AI
Some Google Cloud Pipeline Components consume these artifacts as input or produce them as output. This page shows how to consume and produce...
Read more >
How to use previously created artifacts with Vertex AI Pipelines
Those components produce artifacts as output and use them as input for other pre-built components.
Read more >
Using Vertex ML Metadata with Pipelines - Google Codelabs
Write custom pipeline components that generate artifacts and metadata; Compare Vertex Pipelines runs, both in the Cloud console and ...
Read more >
MLOps on GCP - Part 1: Deploy a Vertex AI Training Pipeline ...
Vertex AI Training Pipeline for scikit-learn models ... Training Step. from kfp.v2.dsl import ( Artifact, Dataset, Input, Output, component, ) ...
Read more >
Vertex Pipelines: Qwik Start | Google Cloud Skills Boost
Imagine you're building out a ML workflow that includes processing data, training a model, hyperparameter tuning, evaluation, and model ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found