Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[feature] google-cloud component for loading existing VertexDataset

See original GitHub issue

Feature Area

/area sdk /area samples /area components

What feature would you like to see?

A new component to load existing VertexDataset. Related to #7792

What is the use case or pain point?

As a user, I have one existing dataset in VertexAI. I am doing several experiments with different models. Each of my experiment is represented by a pipeline.

When developing a kubeflow pipeline for VertexAI, I would like to be able to load an existing VertexDataset instead of using the dataset creation component. But today, the dataset reading component is not existing so I am not able to do it.

Is there a workaround currently?

Today, i am not able to do the task. I tried the following:

@component(base_image="python:3.9", packages_to_install=["google-cloud-aiplatform"])
def get_data(
    project: str,
    region: str,
    bucket: str,
    dataset: Output[VertexDataset]
):
    from google.cloud import aiplatform
    dataset = aiplatform.datasets._Dataset(TEST_ID, project=project, location=region)

This one is dropping the following error: NameError: name 'VertexDataset' is not defined.

Love this idea? Give it a 👍. We prioritize fulfilling features with the most 👍.

Issue Analytics

State:
Created a year ago
Reactions:2
Comments:9 (5 by maintainers)

Top GitHub Comments

2reactions

adhaene-noimoscommented, Oct 12, 2022

@connor-mccarthy Do you have any update regarding this issue?

Current work-arounds are insufficient The work-around I have been using is the use of the importer_node as mentioned in this article - while this should intuitively work for loading Artifacts and functionally does the job, it duplicates entries within the ML Metadata store in the VertexAI project.

Loading existing Artifacts is a key MLOps functionality As a user, it seems like there is clearly something missing that would allow one to link an Artifact to multiple pipelines and multiple pipeline runs without duplicating ML Metadata entries within VertexAI. Use cases include running multiple training runs using different models on the same input Dataset, using the same trained model on multiple datasets, re-using the trained model artifact for model evaluation and deploying in separate pipelines, etc.

2reactions

connor-mccarthycommented, Jun 9, 2022

Thanks, @defoishugo. This development is currently in progress and should be released with an upcoming v2 alpha release!