question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

TFX nightly 0.29.0.dev20210328 create pipeline defs to large for KubeFlow

See original GitHub issue
  • Have I specified the code to reproduce the issue(Yes/No): Should be visible in a pipeline compile test
  • Environment in which the code is executed (e.g., Local(Linux/MacOS/Windows), Interactive Notebook, Google Cloud, etc): Kubeflow 1.1.0
  • TensorFlow version (you are using): Which ever bundles with the TFX version
  • TFX Version: 0.29.0.dev20210328
  • Python version: 3.7.3

Describe the current behavior After moving from TFX 0.28.0 to nightly build (to test fix for this issue https://github.com/tensorflow/tfx/issues/3272) the pipeline defs have grown by more than a factor 20. I have a pipeline with 29 components which was 272KB uncompressed when compiled with TFX 0.28.0 but with the nightly build this is now 6.2MB which is almost 23 times larger. This causes KubeFlow to fail because it exceeds the gRPC limit of the KFP apis.

From KubeFlow: {"error":"grpc: received message larger than max (5429408 vs. 4194304)","message":"grpc: received message larger than max (5429408 vs. 4194304)","code":8}

This seems to stem from the fact that TFX now uses IR representation which seems to be extremely verbose in comparison.

Describe the expected behavior Not a factor 20 increase in size.

Standalone code to reproduce the issue Just compare sizes of the outputs from the kubeflow_dag_runner from TFX version 0.28.0 and latest nightly.

Other info: My 3 component pipeline increased in size from 21KB to 78KB and my 3 component pipeline increased in size from 36KB to 272KB.

It seems as the IR now saves all information about all components for each component. This means that size of the pipeline grows exponentially with the number of nodes.

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Reactions:5
  • Comments:17 (12 by maintainers)

github_iconTop GitHub Comments

4reactions
axeltidemanncommented, Jun 17, 2021

Any thoughts on a timeline to resolve this? This should probably be listed under Breaking Changes in the release document.

1reaction
ConverJenscommented, Jun 18, 2021

I feel like even that isn’t enough (not to mention feasible!) since even the “vanilla” TFX taxi pipeline can no longer run in KubeFlow!

Read more comments on GitHub >

github_iconTop Results From Across the Web

TFX nightly 0.29.0.dev20210328 create pipeline defs to large for ...
I have a pipeline with 29 components which was 272KB uncompressed when compiled with TFX 0.28.0 but with the nightly build this is...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found