Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Importer of Examples does not work with KubeflowV2DagRunner

See original GitHub issue

System information

Have I specified the code to reproduce the issue: Yes
Environment in which the code is executed: Interactive Notebook (beam) on Google Cloud that Trigger Pipelines in Kubeflow
TensorFlow version: 2.5.0
TFX Version: 1.0.0rc1
Python version: 3.7
Python dependencies (from pip freeze output):

...

Describe the current behavior

We would like to use an importer to reload Examples already generated.

1 - It perfectly works when using the local BeamDagRunner. 2 - It does not work when using the KubeflowV2DagRunner. The pipeline is created but the importer fails with the following message Unmapped PropertyType specified in Yaml definition: 0; Failed to parse the artifact yaml schema. Project: XXX

Note: It works for all other artifact types.

Describe the expected behavior

It should work when using KubeflowV2DagRunner.

Standalone code to reproduce the issue

import os
import uuid

from kfp.v2.google import client

from tfx.dsl.components.common.importer import Importer
from tfx.types import standard_artifacts
from tfx.orchestration.beam.beam_dag_runner import BeamDagRunner
from tfx.orchestration.kubeflow.v2.kubeflow_v2_dag_runner import KubeflowV2DagRunner, KubeflowV2DagRunnerConfig
from tfx.orchestration import pipeline
from tfx.orchestration import metadata
from tfx.components import StatisticsGen


####################################################################################################################
# /!\ To Fill with your Values /!\ 
_project_id = 'XXX'
_gcp_region = 'europe-west4'
_gcs_bucket_name = 'XXX'
_examples_to_import = "gs://XXX"


####################################################################################################################
# Pipeline Variables
_pipeline_name = f"test-examples-importer-{str(uuid.uuid4())[0:4]}"
_pipeline_root = 'gs://{}/pipeline_root/{}'.format(_gcs_bucket_name, _pipeline_name)
_module_root = 'gs://{}/pipeline_module/{}'.format(_gcs_bucket_name, _pipeline_name)
_temp_location = 'gs://{}/pipeline_tmp/{}'.format(_gcs_bucket_name, _pipeline_name)
_pipeline_definition_file = 'training_pipeline.json'
_beam_bq_pipeline_args = [
    '--runner=DirectRunner',
    '--direct_running_mode=multi_processing',
    '--direct_num_workers=0',
    '--temp_location=' + _temp_location,
    '--project=' + _project_id,
    '--region=' + _gcp_region
]


####################################################################################################################
# Define a Function to Create a Simple Pipeline with 2 Components:
# - an Importer (of Examples)
# - a StatisticGen
def _create_pipeline(pipeline_name: str, 
                     pipeline_root: str, 
                     beam_pipeline_args: list,
                     examples_to_import: str) -> pipeline.Pipeline:
        
    examples_importer = Importer(source_uri=examples_to_import,
                                 artifact_type=standard_artifacts.Examples,
                                 properties={'split_names': '["train", "eval"]'},
                                 custom_properties={"tfx_version": "1.0.0"},
                                 reimport=False)

    statistics_gen = StatisticsGen(examples=examples_importer.outputs['result'])

    return pipeline.Pipeline(
          pipeline_name=pipeline_name,
          pipeline_root=pipeline_root,
          components=[
              examples_importer, 
              statistics_gen
            ],
        beam_pipeline_args=beam_pipeline_args,
        metadata_connection_config=metadata.sqlite_metadata_connection_config(os.path.join(os.environ['HOME'], 'tfx_metadata', pipeline_name, 'metadata.db'))
      )


####################################################################################################################
# Create the Pipeline
pipeline = _create_pipeline(_pipeline_name, _pipeline_root, _beam_bq_pipeline_args, _examples_to_import)


####################################################################################################################
# Run Locally -> OK
BeamDagRunner().run(pipeline)


####################################################################################################################
# Run on Vertex Pipelines -> KO
runner = KubeflowV2DagRunner(config=KubeflowV2DagRunnerConfig(), output_filename=_pipeline_definition_file)
runner.run(pipeline)
pipelines_client = client.AIPlatformClient(project_id=_project_id, region=_gcp_region,)
_ = pipelines_client.create_run_from_job_spec(_pipeline_definition_file)


####################################################################################################################

Other info / logs

The importer fails with the following message:

Unmapped PropertyType specified in Yaml definition: 0; Failed to parse the artifact yaml schema. Project: XXX

test-examples-importer-1f0d-20210709145125_–Vertex_AI–tinyclues-sandbox–_Google_Cloud_Platform_🔊

Issue Analytics

State:
Created 2 years ago
Reactions:1
Comments:7 (1 by maintainers)

Top GitHub Comments

3reactions

hilcjcommented, Jul 16, 2021

From GCP Vertex AI - Pipeline team: we are working on a backend fixation that will address the issue, ETA end of next week.

1reaction

hilcjcommented, Jul 27, 2021

Hello @hilcj, any news on the fix ? thx

Hi @axelborja,

I just tested my end that the issue seems fixed.

Please give a try and feel free to reach us if you found the issue persist from your end 😃

Top Results From Across the Web

RuntimeParam support for Transform custom_config. #4140

Today Transform does not support Runtimeparams on the custom config see: ... import KubeflowV2DagRunner, KubeflowV2DagRunnerConfig from ...

tfx.v1.orchestration.experimental.KubeflowV2DagRunner

An KubeflowV2DagRunnerConfig object to specify runtime configuration when running the pipeline in Kubeflow. output_dir, An optional output ...

Dual deployments on Vertex AI | Google Cloud Blog

In this post, we will cover an end-to-end workflow enabling dual model deployment scenarios using Kubeflow, TensorFlow Extended (TFX), ...

TFX ExampleGen from CSV file - wrong split - Stack Overflow

I'm trying to import a CSV file into a TFX pipeline but I see a strange behaviour I don't understand. Here's the file...

tfx-helper - PyPI

from tfx_helper.local import LocalPipelineHelper def run() -> None: """Create and ... that won't execute computations minimal_resources = Resources(cpu=1, ...