question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Dataflow jobs fail when using the same beam_pipeline_args for InteractiveContext and KubeflowDagRunner

See original GitHub issue

System information

  • Have I specified the code to reproduce the issue (Yes, No): Yes
  • Environment in which the code is executed (e.g., Local(Linux/MacOS/Windows), Interactive Notebook, Google Cloud, etc): Kubeflow Notebooks
  • TensorFlow version: 2.5.0
  • TFX Version: 1.2.1
  • Python version: 3.7
  • Python dependencies (from pip freeze output): …

Describe the current behavior Typically, we define a BEAM_PIPELINE_ARGS constant and pass it to both InteractiveContext and a Pipeline via beam_pipeline_args during experimentation in a notebook. InteractiveContext adds labels to the beam_pipelines_args list passed to it when running a component. If this same list is passed to a pipeline and that pipeline is run on Kubeflow Pipelines, its components fail on Dataflow because the labels list is malformed.

The Dataflow job fails with the following error:

Encountered an invalid user label. generic::invalid_argument: Invalid field "user_labels";
key "['tfx_executor" does not conform to regular expression "[\p{Ll}\p{Lo}][\p{Ll}\p{Lo}\p{N}_-]{0,62}";
first character "[" is not a non-uppercased letter (Unicode character class Ll or Lo)
[google.rpc.error_details_ext] { message: "Invalid field \"user_labels\"; key \"[\'tfx_executor\"
does not conform to regular expression \"[\\p{Ll}\\p{Lo}][\\p{Ll}\\p{Lo}\\p{N}_-]{0,62}\";
first character \"[\" is not a non-uppercased letter (Unicode character class Ll or Lo)" }

The list of labels looks like this, for example:

["['tfx_executor=third_party_executor', 'tfx_py_version=3-7', 'tfx_runner=interactivecontext',
'tfx_version=1-2-1', 'tfx_executor=tfx-components-transform-executor-executor', 'tfx_py_version=3-7',
'tfx_runner=interactivecontext', 'tfx_version=1-2-1']", 'tfx_executor=third_party_executor',
'tfx_py_version=3-7', 'tfx_runner=kfp', 'tfx_version=1-2-1']

The list of labels is expected to look like this instead:

['tfx_executor=third_party_executor', 'tfx_py_version=3-7', 'tfx_runner=kfp', 'tfx_version=1-2-1']

This error seems limited to running a component with InteractiveContext then running it on Kubeflow Pipelines:

  • Running a component with InteractiveContext then running it using LocalDagRunner works as expected.
  • Only running a component with LocalDagRunner then running it in Kubeflow Pipelines works as expected.

Describe the expected behavior Running components on Dataflow via Kubeflow Pipelines after running them with InteractiveContext should work without error. InteractiveContext shouldn’t mutate beam_pipeline_args passed to it and should instead make a copy of this list.

Standalone code to reproduce the issue

BEAM_PIPELINE_ARGS = ['--runner=DataflowRunner', ...]
example_gen = BigQueryExampleGen(...)
context = InteractiveContext(beam_pipeline_args=BEAM_PIPELINE_ARGS)
pipeline = Pipeline(beam_pipeline_args=BEAM_PIPELINE_ARGS, components=[example_gen])
context.run(example_gen)
# compile the pipeline with KubeflowDagRunner and run it using a kfp.Client method

Name of your Organization (Optional) Twitter

Other info / logs We suspect this happens hereabouts: https://github.com/tensorflow/tfx/blob/master/tfx/dsl/components/base/base_beam_executor.py#L88

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:9 (7 by maintainers)

github_iconTop GitHub Comments

1reaction
codesuecommented, Aug 12, 2022

Changing to beam_pipeline_args = list(beam_pipeline_args or self.beam_pipeline_args) fixes this. There are no changes to the original list and the labels are well-formed now.

1reaction
charlesccychencommented, Aug 12, 2022

Thanks! Could you check whether changing the following line fixes this?

Specifically, change this line:

https://github.com/tensorflow/tfx/blob/6429c643233f1c1fca41c7c02e7da966c763c7eb/tfx/orchestration/experimental/interactive/interactive_context.py#L139

to

beam_pipeline_args = list(beam_pipeline_args or self.beam_pipeline_args)
Read more comments on GitHub >

github_iconTop Results From Across the Web

Troubleshoot Dataflow errors - Google Cloud
These errors typically occur when some of your running Dataflow jobs use the same temp_location to stage temporary job files created when the...
Read more >
tfx/RELEASE.md at master · tensorflow/tfx - GitHub
Fix that the resolver with custom ResolverStrategy (assume correctly packaged) fails. Fixed ElwcBigQueryExampleGen data serializiation error that was causing an ...
Read more >
Dataflow jobs fail after a few 410 errors (while writing to GCS)
I am re-running the same job with the number of shards specified (to be 4000 as this job runs daily and normally outputs...
Read more >
Dataflow Observability, Monitoring, and Troubleshooting
Dataflow Insights: Improve job performance and reduce costs; NEW! Datadog dashboards and monitors: Integrate with tools of your choice ...
Read more >
Monitoring your Dataflow pipelines: an overview | Google Cloud
You can easily run the same job without writing any code by using the Streaming ... Has my job failed? dataflow.googleapis.com/job/is_failed ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found