Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

V2 Output Artifact Classes and Vertex Pipelines

See original GitHub issue

I am trying to create a vertex pipeline using the kfp SDK v2, I’m not sure if this is a vertex issue or a kfp issue, so forgive me if this is the wrong place for this query.

I have a reusable component in my pipeline from which I want to return a Dataset Artifact.

in the component.yaml I have the output specified:

outputs:
    - name: model_configuration
      description: output dataset describing model configuration
      type: Dataset

and as well in the command of the yaml:

--model_configuration, {outputPath: model_configuration}

Then in the function implementing the components logic, I declare a function parameter for the output like so: output_model_configuration_output: Output[Dataset]

in the Artifact types class (declared here: https://github.com/kubeflow/pipelines/blob/master/sdk/python/kfp/v2/components/types/artifact_types.py) I can see there is a method for setting the path of the Artifact with output_artifact.path('Path/to/fil'), but when I implement this method in my code (output_model_configuration_output.path(f"{output_path}model_configuration.parquet")), I am met with an error:

TypeError: 'NoneType' object is not callable

I tried writing the URI To the artifact object’s uri variable directly like so:

output_model_configuration_output.uri = f"{output_path}model_configuration.parquet"

This didn’t throw an error, but the URI Value of the artifact object displayed in the vertex pipeline was not updated in the UI when the pipeline completed.

In addition, I tried adding some metadata to the artifact in this manner: output_model_configuration_output.metadata['num_rows'] = float(len(model_configuration))

But I don’t see this metadata reflected in the Vertex Pipeline UI When the pipeline run finishes, similar to the updated URI.

Let me know if there is anymore information I can provide, or if their is a more appropriate channel for this query.

Issue Analytics

State:
Created 2 years ago
Comments:6

Top GitHub Comments

2reactions

ml6-liamcommented, Nov 1, 2021

Hi,

I have found a way that works. In the end we used kfp sdk to generate a yaml file based on a @component decorated python function, we then adapted this format for our reusable components. Our component.yaml now looks like this:

name: predict
description: Prepare and create predictions request
implementation:
    container:
      args:
      - --executor_input
      - executorInput: null
      - --function_to_execute
      - predict
      command:
      - python3
      - -m
      - kfp.v2.components.executor_main
      - --component_module_path
      - predict.py
      image: gcr.io/PROJECT_ID/kfp/components/predict:latest
inputs: 
    - name: input_1
      type: String
    - name: intput_2
      type: String
outputs:
    - name: output_1
      type: Dataset
    - name: output_2
      type: Dataset

with this change to the yaml, we can now successfully update the artifacts metadata dictionary, and uri through artifact.path = '/path/to/file'. These updates are displayed in the Vertex UI.

I am still unsure why the component.yaml format specified in the Kubeflow documentation does not work - I think this may be a bug with Vertex Pipelines.

0reactions

chensuncommented, Nov 12, 2021

Also, you might want to take a look at: https://github.com/kubeflow/pipelines/pull/6417#issue-977634071 Which would help you build your reusable components with full v2 features support.