Caching in tfx and kfp
See original GitHub issueI’m confused about caching when running a TFX pipeline on KFP.
- I have KFP 1.0.4 deployed via GCP AI platform and I’ve been using both TFX 0.25 and 0.26 when trying this.
- I created my pipeline with
enable_cache=False
. - I ran it once and each component runs as expected and produces the expected artifacts (only checked the artifact buckets and not the sql database).
- I ran it again with no change in inputs or parameters and it’s now using cached results even though I disabled the cache. The following images show the logs for the second run where I would have expected them to run again.
- I repeated this for an entirely new deployment of KFP for both cases.
With TFX 0.25
With TFX 0.26
Questions:
- Why is the
enable_cache=False
not respected? - KFP documentation mentions that its caching mechanisms should not be used for TFX pipelines. Why am I seeing a message about a cached step from KFP in the TFX 0.26 case rather than from the TFX component drivers?
- Can you enable/disable caching on a per component basis?
- Is it possible to get some logs for why a cached result was used / not used?
- Is there anyway to get TFX to also consider if the docker image has changed when determining if a cached result is invalidated?
- I’m guessing no since it’s determined by the driver which would run inside the (updated) container?
- Would the custom container based component handle this differently?
- Is there a better place to ask these kind of questions?
Issue Analytics
- State:
- Created 3 years ago
- Reactions:2
- Comments:16 (6 by maintainers)
Top Results From Across the Web
Building TFX pipelines - TensorFlow
Caching. TFX pipeline caching lets your pipeline skip over components that have been executed with the same set of inputs in a previous...
Read more >Caching | Kubeflow
Kubeflow Pipelines caching provides step-level output caching. And caching is enabled by default for all pipelines submitted through the KFP ...
Read more >Workflow Orchestration - Apache Beam
This section describes two orchestrated ML workflows, one with Kubeflow Pipelines (KFP) and one with Tensorflow Extended (TFX). These two ...
Read more >Guided Project 1 - | notebook.community
Learn how to generate a standard TFX template pipeline using tfx template ... %%bash TFX_PKG="tfx==0.22.0" KFP_PKG="kfp==0.5.1" pip freeze | grep $TFX_PKG ...
Read more >Build a pipeline | Vertex AI | Google Cloud
To learn more about using Vertex AI Pipelines to run a TFX pipeline, ... The kfp.dsl package contains the domain-specific language (DSL) that...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Something like this works for me:
@johnPertoft this is also a blocker for me. We will have to pivot towards vanilla kubeflow, which really isn’t that bad to use. I will miss the interactive runner, but using the kfp is pretty easy. It would be nice to have kubeflow components for each of the tfx steps. Probably will happen eventually.