question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

BigQueryExampleGen fails on Kubeflow Pipelines when using long queries

See original GitHub issue

System information

  • Have I specified the code to reproduce the issue (Yes, No): Yes
  • Environment in which the code is executed (e.g., Local(Linux/MacOS/Windows), Interactive Notebook, Google Cloud, etc): MacOS
  • TensorFlow version: tensorflow==2.4.2
  • TFX Version: 0.26.3 but it might also affect the master branch
  • Python version: Python 3.7.7
  • Python dependencies (from pip freeze output): Irrelevant

Describe the current behavior BigQueryExampleGen fails on Kubeflow Pipelines with the error standard_init_linux.go:211: exec user process caused "argument list too long" due to the length of the tfx_ir / serialized_component arg when using long queries. This is a blocker for us.

Describe the expected behavior BigQueryExampleGen should not fail on Kubeflow Pipelines, even when using long queries.

Standalone code to reproduce the issue Run BigQueryExampleGen with a query that selects ~500 features to cause the tfx_ir / serialized_component string to exceed ~131k characters.

Name of your Organization (Optional) Twitter

Other info / logs

This (TFX IR exceeding the flag size limit) is a known issue and there’s a TODO to fix it in the TFX component to KFP operator conversion logic. The suggested fix is writing the IR to the pipeline_root and letting container_entrypoint.py read it back. There seems to be a PR that could have resolved this, but it was marked stale and automatically closed: https://github.com/tensorflow/tfx/pull/3842. Relevant changes from that PR are in https://github.com/tensorflow/tfx/pull/4298. A PR to remove extra node information in the generated IR (https://github.com/tensorflow/tfx/pull/3992) merged after this was proposed, and I’m not sure if those changes would still be necessary when persisting the IR to a file instead of using a string.

Issue Analytics

  • State:open
  • Created 2 years ago
  • Reactions:8
  • Comments:8 (4 by maintainers)

github_iconTop GitHub Comments

3reactions
iain-stitt-bycommented, Oct 14, 2021

We faced the same issue and ended up generating a new custom component that would accept a GCS path to a text file instead of passing the raw SQL string. That way only the path to the file is encoded in the “input_config” in TFX IR. We modified the Executor to then read the text file and make use of the SQL string in the same way as the current BaseExampleGenExecutor. The fix that @codesue linked looks like a better, more generic, solution to this problem though.

1reaction
iain-stitt-bycommented, Mar 18, 2022

Sure @rcrowe-google , we can look into adding some of our custom components to the TFX-Addons project

Read more comments on GitHub >

github_iconTop Results From Across the Web

Why pod on GKE cluster is OOMkilled when trying to run a ...
The problem is that, while the pipeline from the example runs without problem, this pipeline always fails with the pod on the GKE...
Read more >
Troubleshooting | Kubeflow
This page presents some hints for troubleshooting specific problems that you may encounter. Diagnosing problems in your Kubeflow Pipelines ...
Read more >
The ExampleGen TFX Pipeline Component - TensorFlow
For query-based example gen (e.g. BigQueryExampleGen, PrestoExampleGen), pattern is a SQL query. By default, the entire input base dir is ...
Read more >
Changelog - Page 2 of 4 - codesue
... Systems) of Designing Data-Intensive Applications by Martin Kleppmann ... “BigQueryExampleGen fails on Kubeflow Pipelines when using long queries” in ...
Read more >
Google Vertex AI: The Easiest Way to Run ML Pipelines
Kubeflow Pipelines using the Kubeflow SDK ≥ 1.6; TensorFlow Extended Pipelines ... If you have questions, please reach out to me via LinkedIn...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found