TFX CLI with Kubeflow / AI Platform Pipelines runtime context missing when output is taken from cache
See original GitHub issueSystem information
- Have I specified the code to reproduce the issue (Yes/No): no (Taxicab example works well)
- Environment in which the code is executed (e.g., Local (Linux/MacOS/Windows), Interactive Notebook, Google Cloud, etc): AI Platform Pipelines
- TensorFlow version (you are using): /
- TFX Version: - Python version: 0.28.0
Describe the current behavior The pipeline deployed with the TFX CLI runs into the following error
Traceback (most recent call last):
File "/opt/conda/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/opt/conda/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/opt/conda/lib/python3.7/site-packages/tfx/orchestration/kubeflow/container_entrypoint.py", line 360, in <module>
main()
File "/opt/conda/lib/python3.7/site-packages/tfx/orchestration/kubeflow/container_entrypoint.py", line 353, in main
execution_info = launcher.launch()
File "/opt/conda/lib/python3.7/site-packages/tfx/orchestration/launcher/base_component_launcher.py", line 198, in launch
self._exec_properties)
File "/opt/conda/lib/python3.7/site-packages/tfx/orchestration/launcher/base_component_launcher.py", line 167, in _run_driver
component_info=self._component_info)
File "/opt/conda/lib/python3.7/site-packages/tfx/dsl/components/base/base_driver.py", line 270, in pre_execution
driver_args, pipeline_info)
File "/opt/conda/lib/python3.7/site-packages/tfx/dsl/components/base/base_driver.py", line 158, in resolve_input_artifacts
producer_component_id=input_channel.producer_component_id)
File "/opt/conda/lib/python3.7/site-packages/tfx/orchestration/metadata.py", line 948, in search_artifacts
pipeline_info)
RuntimeError: Pipeline run context for PipelineInfo(pipeline_name: sentiment4, pipeline_root: gs://sascha-playground-doit-kubeflowpipelines-default/sentiment4, run_id: sentiment4-qnknl) does not exist
First run:
Second run with additional component
Steps to reproduce
- deploy pipeline with one component
- run pipeline with one component (👍 works)
- add another component
- run the pipeline (this time the output is taken from cache) (👎 fails)
Assume the second component doesn’t find the cached data because the component did not exist in the first run.
Describe the expected behavior The second component get’s executed
Standalone code to reproduce the issue Taxicab sample works fine as a test case
Name of your Organization (Optional) /
Other info / logs
Issue Analytics
- State:
- Created 3 years ago
- Comments:17 (8 by maintainers)
Top Results From Across the Web
Kubeflow / AI Platform Pipelines runtime context missing when ...
run pipeline with one component ( works); add another component; run the pipeline (this time the output is taken from cache) ( fails)....
Read more >Continuous training with TFX and Cloud AI Platform
In this lab, you use the TFX CLI utility to build and deploy a TFX pipeline that uses Kubeflow pipelines for orchestration, AI...
Read more >tfx Changelog - pyup.io
Output artifacts from multiple invocations of the same component are given ... TFX CLI now supports runtime parameter on Kubeflow, Vertex, and Airflow....
Read more >ML model monitoring: Logging serving requests by using AI ...
This guide describes the following: How to serve a Keras model with TensorFlow 2.3 by using AI Platform Prediction. How to configure a...
Read more >Evaluation of MLOps Tools for Kubernetes - Diva Portal
context, with the focus being on their integration into this ecosystem. ... The platforms are Kubeflow, Pachyderm, and Polyaxon, and.
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Refer to https://github.com/kubeflow/pipelines/issues/5303#issuecomment-851904651, the bug is fixed on KFP 1.6.0, root cause is that KFP cache server used a hacky way to detect tfx pods, and it changed in newer versions.
I am working on releasing KFP 1.6.0+ to mkp
Are you satisfied with the resolution of your issue? Yes No