TFX nightly 0.29.0.dev20210328 create pipeline defs to large for KubeFlow
See original GitHub issue- Have I specified the code to reproduce the issue(Yes/No): Should be visible in a pipeline compile test
- Environment in which the code is executed (e.g., Local(Linux/MacOS/Windows), Interactive Notebook, Google Cloud, etc): Kubeflow 1.1.0
- TensorFlow version (you are using): Which ever bundles with the TFX version
- TFX Version: 0.29.0.dev20210328
- Python version: 3.7.3
Describe the current behavior After moving from TFX 0.28.0 to nightly build (to test fix for this issue https://github.com/tensorflow/tfx/issues/3272) the pipeline defs have grown by more than a factor 20. I have a pipeline with 29 components which was 272KB uncompressed when compiled with TFX 0.28.0 but with the nightly build this is now 6.2MB which is almost 23 times larger. This causes KubeFlow to fail because it exceeds the gRPC limit of the KFP apis.
From KubeFlow:
{"error":"grpc: received message larger than max (5429408 vs. 4194304)","message":"grpc: received message larger than max (5429408 vs. 4194304)","code":8}
This seems to stem from the fact that TFX now uses IR representation which seems to be extremely verbose in comparison.
Describe the expected behavior Not a factor 20 increase in size.
Standalone code to reproduce the issue Just compare sizes of the outputs from the kubeflow_dag_runner from TFX version 0.28.0 and latest nightly.
Other info: My 3 component pipeline increased in size from 21KB to 78KB and my 3 component pipeline increased in size from 36KB to 272KB.
It seems as the IR now saves all information about all components for each component. This means that size of the pipeline grows exponentially with the number of nodes.
Issue Analytics
- State:
- Created 2 years ago
- Reactions:5
- Comments:17 (12 by maintainers)
Top GitHub Comments
Any thoughts on a timeline to resolve this? This should probably be listed under Breaking Changes in the release document.
I feel like even that isn’t enough (not to mention feasible!) since even the “vanilla” TFX taxi pipeline can no longer run in KubeFlow!