question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Importer of Examples does not work with KubeflowV2DagRunner

See original GitHub issue

System information

  • Have I specified the code to reproduce the issue: Yes
  • Environment in which the code is executed: Interactive Notebook (beam) on Google Cloud that Trigger Pipelines in Kubeflow
  • TensorFlow version: 2.5.0
  • TFX Version: 1.0.0rc1
  • Python version: 3.7
  • Python dependencies (from pip freeze output):
...

Describe the current behavior

We would like to use an importer to reload Examples already generated.

1 - It perfectly works when using the local BeamDagRunner. 2 - It does not work when using the KubeflowV2DagRunner. The pipeline is created but the importer fails with the following message Unmapped PropertyType specified in Yaml definition: 0; Failed to parse the artifact yaml schema. Project: XXX

Note: It works for all other artifact types.

Describe the expected behavior

It should work when using KubeflowV2DagRunner.

Standalone code to reproduce the issue

import os
import uuid

from kfp.v2.google import client

from tfx.dsl.components.common.importer import Importer
from tfx.types import standard_artifacts
from tfx.orchestration.beam.beam_dag_runner import BeamDagRunner
from tfx.orchestration.kubeflow.v2.kubeflow_v2_dag_runner import KubeflowV2DagRunner, KubeflowV2DagRunnerConfig
from tfx.orchestration import pipeline
from tfx.orchestration import metadata
from tfx.components import StatisticsGen


####################################################################################################################
# /!\ To Fill with your Values /!\ 
_project_id = 'XXX'
_gcp_region = 'europe-west4'
_gcs_bucket_name = 'XXX'
_examples_to_import = "gs://XXX"


####################################################################################################################
# Pipeline Variables
_pipeline_name = f"test-examples-importer-{str(uuid.uuid4())[0:4]}"
_pipeline_root = 'gs://{}/pipeline_root/{}'.format(_gcs_bucket_name, _pipeline_name)
_module_root = 'gs://{}/pipeline_module/{}'.format(_gcs_bucket_name, _pipeline_name)
_temp_location = 'gs://{}/pipeline_tmp/{}'.format(_gcs_bucket_name, _pipeline_name)
_pipeline_definition_file = 'training_pipeline.json'
_beam_bq_pipeline_args = [
    '--runner=DirectRunner',
    '--direct_running_mode=multi_processing',
    '--direct_num_workers=0',
    '--temp_location=' + _temp_location,
    '--project=' + _project_id,
    '--region=' + _gcp_region
]


####################################################################################################################
# Define a Function to Create a Simple Pipeline with 2 Components:
# - an Importer (of Examples)
# - a StatisticGen
def _create_pipeline(pipeline_name: str, 
                     pipeline_root: str, 
                     beam_pipeline_args: list,
                     examples_to_import: str) -> pipeline.Pipeline:
        
    examples_importer = Importer(source_uri=examples_to_import,
                                 artifact_type=standard_artifacts.Examples,
                                 properties={'split_names': '["train", "eval"]'},
                                 custom_properties={"tfx_version": "1.0.0"},
                                 reimport=False)

    statistics_gen = StatisticsGen(examples=examples_importer.outputs['result'])

    return pipeline.Pipeline(
          pipeline_name=pipeline_name,
          pipeline_root=pipeline_root,
          components=[
              examples_importer, 
              statistics_gen
            ],
        beam_pipeline_args=beam_pipeline_args,
        metadata_connection_config=metadata.sqlite_metadata_connection_config(os.path.join(os.environ['HOME'], 'tfx_metadata', pipeline_name, 'metadata.db'))
      )


####################################################################################################################
# Create the Pipeline
pipeline = _create_pipeline(_pipeline_name, _pipeline_root, _beam_bq_pipeline_args, _examples_to_import)


####################################################################################################################
# Run Locally -> OK
BeamDagRunner().run(pipeline)


####################################################################################################################
# Run on Vertex Pipelines -> KO
runner = KubeflowV2DagRunner(config=KubeflowV2DagRunnerConfig(), output_filename=_pipeline_definition_file)
runner.run(pipeline)
pipelines_client = client.AIPlatformClient(project_id=_project_id, region=_gcp_region,)
_ = pipelines_client.create_run_from_job_spec(_pipeline_definition_file)


####################################################################################################################

Other info / logs

The importer fails with the following message:

Unmapped PropertyType specified in Yaml definition: 0; Failed to parse the artifact yaml schema. Project: XXX

test-examples-importer-1f0d-20210709145125_–Vertex_AI–tinyclues-sandbox–_Google_Cloud_Platform_🔊

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Reactions:1
  • Comments:7 (1 by maintainers)

github_iconTop GitHub Comments

3reactions
hilcjcommented, Jul 16, 2021

From GCP Vertex AI - Pipeline team: we are working on a backend fixation that will address the issue, ETA end of next week.

1reaction
hilcjcommented, Jul 27, 2021

Hello @hilcj, any news on the fix ? thx

Hi @axelborja,

I just tested my end that the issue seems fixed. image

Please give a try and feel free to reach us if you found the issue persist from your end 😃

Read more comments on GitHub >

github_iconTop Results From Across the Web

RuntimeParam support for Transform custom_config. #4140
Today Transform does not support Runtimeparams on the custom config see: ... import KubeflowV2DagRunner, KubeflowV2DagRunnerConfig from ...
Read more >
tfx.v1.orchestration.experimental.KubeflowV2DagRunner
An KubeflowV2DagRunnerConfig object to specify runtime configuration when running the pipeline in Kubeflow. output_dir, An optional output ...
Read more >
Dual deployments on Vertex AI | Google Cloud Blog
In this post, we will cover an end-to-end workflow enabling dual model deployment scenarios using Kubeflow, TensorFlow Extended (TFX), ...
Read more >
TFX ExampleGen from CSV file - wrong split - Stack Overflow
I'm trying to import a CSV file into a TFX pipeline but I see a strange behaviour I don't understand. Here's the file...
Read more >
tfx-helper - PyPI
from tfx_helper.local import LocalPipelineHelper def run() -> None: """Create and ... that won't execute computations minimal_resources = Resources(cpu=1, ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found