LocalDagRunner vs AirflowDagRunner: how to setup Artifact names?
See original GitHub issueSystem information
- Linux
- TensorFlow version: 2.8
- TFX Version: 1.8
- Python version: 3.8
Problem
I have created tfx pipeline and run it via LocalDagRunner().run(pipe) and AirflowDagRunner(AirflowPipelineConfig(AIRFLOW_ARGS)).run().
I have noticed that AirflowDagRunner saves artifacts in such format in the metadata store:
name post_transform_stats pre_transform_stats statistics
while LocalDagRunner saves artifact like this in the metadata store:
name ihw_spans:2022-06-28T14:54:32.196073:StatisticsGen:statistics:0 ihw_spans:2022-06-28T14:54:32.196073:Transform:post_transform_stats:0 ihw_spans:2022-06-28T14:54:32.196073:Transform:pre_transform_stats:0
ihw_spans:2022-06-28T15:10:55.220458:StatisticsGen:statistics:0 ihw_spans:2022-06-28T15:10:55.220458:Transform:post_transform_stats:0 hw_spans:2022-06-28T15:10:55.220458:Transform:pre_transform_stats:0
Thus when I take new data span via Airflow I get such error:
ml_metadata.errors.AlreadyExistsError: Given node already exists: type_id: 17
uri: "/home/larion/airflow/tfx/pipelines/ihw_spans/StatisticsGen/statistics/46"
custom_properties {
key: "name"
value {
string_value: "statistics"
}
}
custom_properties {
key: "producer_component"
value {
string_value: "StatisticsGen"
}
}
name: "statistics"
Cause metadata can’t save new span statistics with the same name. But when I run dag locally in the second or third time everything works fine, cause the name is composed by strings with date and thus it is always unique.
Is it possible to config naming strategy for pipelines? Is it possible to set artifact names in the metadata.Artifact table during components creation?
I do not understand how I will trigger tfx pipeline second and third times via airflow if there is such strange error.
Issue Analytics
- State:
- Created a year ago
- Reactions:1
- Comments:5 (5 by maintainers)
Top GitHub Comments
@jiyongjung0, @1025KB
I’ve noticed a similar issue being raised in Stackoverflow yesterday. Please refer this link. Thanks!
@Daard,
Could you please try the suggested solution, by running the following method after the TFX installation and let us know if it works.
sed -i 's/artifact.name = name/artifact.name = f"{name}:{pipeline_info.run_id}"/' /opt/conda/lib/python3.7/site-packages/tfx/dsl/components/base/base_driver.py
Thank you!