question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[feature] google-cloud component for loading existing VertexDataset

See original GitHub issue

Feature Area

/area sdk /area samples /area components

What feature would you like to see?

A new component to load existing VertexDataset. Related to #7792

What is the use case or pain point?

As a user, I have one existing dataset in VertexAI. I am doing several experiments with different models. Each of my experiment is represented by a pipeline.

When developing a kubeflow pipeline for VertexAI, I would like to be able to load an existing VertexDataset instead of using the dataset creation component. But today, the dataset reading component is not existing so I am not able to do it.

Is there a workaround currently?

Today, i am not able to do the task. I tried the following:

@component(base_image="python:3.9", packages_to_install=["google-cloud-aiplatform"])
def get_data(
    project: str,
    region: str,
    bucket: str,
    dataset: Output[VertexDataset]
):
    from google.cloud import aiplatform
    dataset = aiplatform.datasets._Dataset(TEST_ID, project=project, location=region)

This one is dropping the following error: NameError: name 'VertexDataset' is not defined.


Love this idea? Give it a 👍. We prioritize fulfilling features with the most 👍.

Issue Analytics

  • State:open
  • Created a year ago
  • Reactions:2
  • Comments:9 (5 by maintainers)

github_iconTop GitHub Comments

2reactions
adhaene-noimoscommented, Oct 12, 2022

@connor-mccarthy Do you have any update regarding this issue?

Current work-arounds are insufficient The work-around I have been using is the use of the importer_node as mentioned in this article - while this should intuitively work for loading Artifacts and functionally does the job, it duplicates entries within the ML Metadata store in the VertexAI project.

Loading existing Artifacts is a key MLOps functionality As a user, it seems like there is clearly something missing that would allow one to link an Artifact to multiple pipelines and multiple pipeline runs without duplicating ML Metadata entries within VertexAI. Use cases include running multiple training runs using different models on the same input Dataset, using the same trained model on multiple datasets, re-using the trained model artifact for model evaluation and deploying in separate pipelines, etc.

2reactions
connor-mccarthycommented, Jun 9, 2022

Thanks, @defoishugo. This development is currently in progress and should be released with an upcoming v2 alpha release!

Read more comments on GitHub >

github_iconTop Results From Across the Web

Introduction to Google Cloud Pipeline Components | Vertex AI
Create a new dataset and load different data types into the dataset (image, tabular, text, ... Upload an existing model to Vertex AI...
Read more >
Google Vertex AI: The Easiest Way to Run ML Pipelines
This article covers the steps needed to implement a reliable, reproducible and automated machine learning pipeline with Google Vertex AI.
Read more >
google_cloud_pipeline_compon...
If training on a Vertex AI dataset, you can use one of the following split ... available within Google Cloud, can also be...
Read more >
Google Cloud Service Health
Incident affecting Google Cloud Infrastructure Components, ... Vertex AI Model Monitoring, Vertex AI ML Metadata, Cloud Load Balancing.
Read more >
Giving Vertex AI, the New Unified ML Platform on Google ...
The Google Cloud AI Platform team have been heads down the past few months building a unified view of the machine learning landscape....
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found