RuntimeParam support for Transform custom_config.
See original GitHub issueSystem information
- TFX Version (you are using):
- Environment in which you plan to use the feature (e.g., Local (Linux/MacOS/Windows), Interactive Notebook, Google Cloud, etc…):
- Are you willing to contribute it (Yes/No):
Describe the feature and the current behavior/state. Today Transform does not support Runtimeparams on the custom config see:https://github.com/tensorflow/tfx/blob/5b3b3fc2c903b76f8c75ba04ac030405428b6160/tfx/components/transform/component.py#L104
Will this change the current API? How? Yes it will allow the usage of RuntimeParams for Transform.
Who will benefit with this feature? users that want their pipeline dynamic.
Do you have a workaround or are completely blocked by this? We saw the dev done: https://github.com/tensorflow/tfx/pull/4077/files#diff-1f27fc087329a6364e8a9f610772be8bceb7cf469f25eb56c85ce404c93c5476R199
But not sure how to support that for Transform.
Name of your Organization (Optional)
Any Other info. Reproducible Code:
import json
from typing import Text
from tfx.components import Transform
from tfx.dsl.components.common.importer import Importer
from tfx.extensions.google_cloud_big_query.example_gen import component as big_query_example_gen_component
from tfx.orchestration import pipeline
from tfx.orchestration.data_types import RuntimeParameter
from tfx.orchestration.kubeflow.v2.kubeflow_v2_dag_runner import KubeflowV2DagRunner, KubeflowV2DagRunnerConfig
from kfp.v2.google import client
from tfx.proto import example_gen_pb2
from tfx.types import standard_artifacts
from pipelines.constants import GCS_BUCKET_NAME, GCP_REGION, GOOGLE_CLOUD_PROJECT
def create_full_training_pipeline(pipeline_root: str, _beam_args: dict) -> pipeline.Pipeline:
config_transform = RuntimeParameter(
name="config_transform",
ptype=Text
)
example_gen = big_query_example_gen_component.BigQueryExampleGen(query='select 1') \
.with_id('ExampleGenR')
fake_import = Importer('pipo', artifact_type=standard_artifacts.Schema)
transform = Transform(example_gen.outputs['examples'], fake_import.outputs['result'],
module_file='/tmp/test.py', custom_config=config_transform)
return pipeline.Pipeline(
beam_pipeline_args=_beam_args,
pipeline_name="full-training",
pipeline_root=pipeline_root,
components=[
example_gen,
fake_import,
transform
],
)
_temp_location = 'gs://{}/pipeline_tmp/{}'.format(GCS_BUCKET_NAME, 'test')
_beam_pipeline_args = [
'--runner=DirectRunner',
'--direct_running_mode=in_memory',
'--direct_num_workers=0',
'--temp_location=' + _temp_location,
'--project=' + GOOGLE_CLOUD_PROJECT,
'--region=' + GCP_REGION
]
training_pipeline = create_full_training_pipeline(
pipeline_root=_temp_location,
_beam_args=_beam_pipeline_args
)
PIPELINE_DEFINITION_FILE = '/tmp/test'
runner = KubeflowV2DagRunner(
config=KubeflowV2DagRunnerConfig(),
output_filename=PIPELINE_DEFINITION_FILE)
_ = runner.run(training_pipeline)
pipelines_client = client.AIPlatformClient(
project_id=GOOGLE_CLOUD_PROJECT,
region="europe-west4",
)
pipelines_client.create_run_from_job_spec(PIPELINE_DEFINITION_FILE,
parameter_values={"config_transform": json.dumps({'test': 'coucou'})}
)
```
Raises:
> The pipeline parameter config_transform is not found in the pipeline job input definitions.
Issue Analytics
- State:
- Created 2 years ago
- Reactions:1
- Comments:9 (4 by maintainers)
Top GitHub Comments
@tanguycdls , i rechecked and you’re correct that this was for trainer.The feature is still not there.Let us get back to you.Thanks.
Yep, it’s not currently supported, typehint needs to be changed to add RuntimeParam like this thus you can pass in custom_config as a json str dir in runtime, we will add that in later version