[feature] google-cloud component for loading existing VertexDataset
See original GitHub issueFeature Area
/area sdk /area samples /area components
What feature would you like to see?
A new component to load existing VertexDataset. Related to #7792
What is the use case or pain point?
As a user, I have one existing dataset in VertexAI. I am doing several experiments with different models. Each of my experiment is represented by a pipeline.
When developing a kubeflow pipeline for VertexAI, I would like to be able to load an existing VertexDataset instead of using the dataset creation component. But today, the dataset reading component is not existing so I am not able to do it.
Is there a workaround currently?
Today, i am not able to do the task. I tried the following:
@component(base_image="python:3.9", packages_to_install=["google-cloud-aiplatform"])
def get_data(
project: str,
region: str,
bucket: str,
dataset: Output[VertexDataset]
):
from google.cloud import aiplatform
dataset = aiplatform.datasets._Dataset(TEST_ID, project=project, location=region)
This one is dropping the following error: NameError: name 'VertexDataset' is not defined
.
Love this idea? Give it a 👍. We prioritize fulfilling features with the most 👍.
Issue Analytics
- State:
- Created a year ago
- Reactions:2
- Comments:9 (5 by maintainers)
Top Results From Across the Web
Introduction to Google Cloud Pipeline Components | Vertex AI
Create a new dataset and load different data types into the dataset (image, tabular, text, ... Upload an existing model to Vertex AI...
Read more >Google Vertex AI: The Easiest Way to Run ML Pipelines
This article covers the steps needed to implement a reliable, reproducible and automated machine learning pipeline with Google Vertex AI.
Read more >google_cloud_pipeline_compon...
If training on a Vertex AI dataset, you can use one of the following split ... available within Google Cloud, can also be...
Read more >Google Cloud Service Health
Incident affecting Google Cloud Infrastructure Components, ... Vertex AI Model Monitoring, Vertex AI ML Metadata, Cloud Load Balancing.
Read more >Giving Vertex AI, the New Unified ML Platform on Google ...
The Google Cloud AI Platform team have been heads down the past few months building a unified view of the machine learning landscape....
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
@connor-mccarthy Do you have any update regarding this issue?
Current work-arounds are insufficient The work-around I have been using is the use of the
importer_node
as mentioned in this article - while this should intuitively work for loading Artifacts and functionally does the job, it duplicates entries within the ML Metadata store in the VertexAI project.Loading existing Artifacts is a key MLOps functionality As a user, it seems like there is clearly something missing that would allow one to link an Artifact to multiple pipelines and multiple pipeline runs without duplicating ML Metadata entries within VertexAI. Use cases include running multiple training runs using different models on the same input Dataset, using the same trained model on multiple datasets, re-using the trained model artifact for model evaluation and deploying in separate pipelines, etc.
Thanks, @defoishugo. This development is currently in progress and should be released with an upcoming v2 alpha release!