Importer of Examples does not work with KubeflowV2DagRunner
See original GitHub issueSystem information
- Have I specified the code to reproduce the issue: Yes
- Environment in which the code is executed: Interactive Notebook (beam) on Google Cloud that Trigger Pipelines in Kubeflow
- TensorFlow version: 2.5.0
- TFX Version: 1.0.0rc1
- Python version: 3.7
- Python dependencies (from
pip freeze
output):
...
Describe the current behavior
We would like to use an importer to reload Examples already generated.
1 - It perfectly works when using the local BeamDagRunner
.
2 - It does not work when using the KubeflowV2DagRunner
. The pipeline is created but the importer fails with the following message Unmapped PropertyType specified in Yaml definition: 0; Failed to parse the artifact yaml schema. Project: XXX
Note: It works for all other artifact types.
Describe the expected behavior
It should work when using KubeflowV2DagRunner
.
Standalone code to reproduce the issue
import os
import uuid
from kfp.v2.google import client
from tfx.dsl.components.common.importer import Importer
from tfx.types import standard_artifacts
from tfx.orchestration.beam.beam_dag_runner import BeamDagRunner
from tfx.orchestration.kubeflow.v2.kubeflow_v2_dag_runner import KubeflowV2DagRunner, KubeflowV2DagRunnerConfig
from tfx.orchestration import pipeline
from tfx.orchestration import metadata
from tfx.components import StatisticsGen
####################################################################################################################
# /!\ To Fill with your Values /!\
_project_id = 'XXX'
_gcp_region = 'europe-west4'
_gcs_bucket_name = 'XXX'
_examples_to_import = "gs://XXX"
####################################################################################################################
# Pipeline Variables
_pipeline_name = f"test-examples-importer-{str(uuid.uuid4())[0:4]}"
_pipeline_root = 'gs://{}/pipeline_root/{}'.format(_gcs_bucket_name, _pipeline_name)
_module_root = 'gs://{}/pipeline_module/{}'.format(_gcs_bucket_name, _pipeline_name)
_temp_location = 'gs://{}/pipeline_tmp/{}'.format(_gcs_bucket_name, _pipeline_name)
_pipeline_definition_file = 'training_pipeline.json'
_beam_bq_pipeline_args = [
'--runner=DirectRunner',
'--direct_running_mode=multi_processing',
'--direct_num_workers=0',
'--temp_location=' + _temp_location,
'--project=' + _project_id,
'--region=' + _gcp_region
]
####################################################################################################################
# Define a Function to Create a Simple Pipeline with 2 Components:
# - an Importer (of Examples)
# - a StatisticGen
def _create_pipeline(pipeline_name: str,
pipeline_root: str,
beam_pipeline_args: list,
examples_to_import: str) -> pipeline.Pipeline:
examples_importer = Importer(source_uri=examples_to_import,
artifact_type=standard_artifacts.Examples,
properties={'split_names': '["train", "eval"]'},
custom_properties={"tfx_version": "1.0.0"},
reimport=False)
statistics_gen = StatisticsGen(examples=examples_importer.outputs['result'])
return pipeline.Pipeline(
pipeline_name=pipeline_name,
pipeline_root=pipeline_root,
components=[
examples_importer,
statistics_gen
],
beam_pipeline_args=beam_pipeline_args,
metadata_connection_config=metadata.sqlite_metadata_connection_config(os.path.join(os.environ['HOME'], 'tfx_metadata', pipeline_name, 'metadata.db'))
)
####################################################################################################################
# Create the Pipeline
pipeline = _create_pipeline(_pipeline_name, _pipeline_root, _beam_bq_pipeline_args, _examples_to_import)
####################################################################################################################
# Run Locally -> OK
BeamDagRunner().run(pipeline)
####################################################################################################################
# Run on Vertex Pipelines -> KO
runner = KubeflowV2DagRunner(config=KubeflowV2DagRunnerConfig(), output_filename=_pipeline_definition_file)
runner.run(pipeline)
pipelines_client = client.AIPlatformClient(project_id=_project_id, region=_gcp_region,)
_ = pipelines_client.create_run_from_job_spec(_pipeline_definition_file)
####################################################################################################################
Other info / logs
The importer fails with the following message:
Unmapped PropertyType specified in Yaml definition: 0; Failed to parse the artifact yaml schema. Project: XXX
Issue Analytics
- State:
- Created 2 years ago
- Reactions:1
- Comments:7 (1 by maintainers)
Top Results From Across the Web
RuntimeParam support for Transform custom_config. #4140
Today Transform does not support Runtimeparams on the custom config see: ... import KubeflowV2DagRunner, KubeflowV2DagRunnerConfig from ...
Read more >tfx.v1.orchestration.experimental.KubeflowV2DagRunner
An KubeflowV2DagRunnerConfig object to specify runtime configuration when running the pipeline in Kubeflow. output_dir, An optional output ...
Read more >Dual deployments on Vertex AI | Google Cloud Blog
In this post, we will cover an end-to-end workflow enabling dual model deployment scenarios using Kubeflow, TensorFlow Extended (TFX), ...
Read more >TFX ExampleGen from CSV file - wrong split - Stack Overflow
I'm trying to import a CSV file into a TFX pipeline but I see a strange behaviour I don't understand. Here's the file...
Read more >tfx-helper - PyPI
from tfx_helper.local import LocalPipelineHelper def run() -> None: """Create and ... that won't execute computations minimal_resources = Resources(cpu=1, ...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
From GCP Vertex AI - Pipeline team: we are working on a backend fixation that will address the issue, ETA end of next week.
Hi @axelborja,
I just tested my end that the issue seems fixed.
Please give a try and feel free to reach us if you found the issue persist from your end 😃