Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Transform component, running beam with Flink Runner. Flink job fails with ModuleNotFoundError.

See original GitHub issue

System information

Have I specified the code to reproduce the issue (Yes, No): No
Environment in which the code is executed (e.g., Local(Linux/MacOS/Windows), Interactive Notebook, Google Cloud, etc): Linux
TensorFlow version: 2.5.3
TFX Version: 1.2.0
Python version: 3.7
Python dependencies (from pip freeze output):

Docker image for sdk workers built from

FROM apache/beam_python3.7_sdk:2.39.0 tfx==1.2.0 tensorflow==2.5.3

Describe the current behavior

Running a tfx pipeline with FlinkRunner for beam. The Transform component manages to start the flink job but it instantly fails with ModuleNotFoundError: No module named 'pipeline_functions'.

I wonder where can I find the code responsible for adding user code as --extra_package or actually installing the wheel which contains my user code ? I was not able to locate it in the repository…

Also I wonder if this comment is still “actual” (given my versions of tfx and beam) ?

Should I disable the wheel packaging and set force_tf_compat_v1=True in the Transform component ? If yes I wonder what are the implications of this ?

More globally I was hoping to start a discussion with potential other users of TFX who might have a similar setup, we are moving from running TFX pipelines with GCS for storage and Dataflow runners for Beam towards running TFX pipelines with HDFS for storage and Flink runners for Beam. The journey is tortuous and I was hoping to find potential other users who might wanna share their experience.

Thank you !

Issue Analytics

State:
Created a year ago
Comments:11 (6 by maintainers)

Top GitHub Comments

1reaction

ConverJenscommented, Aug 31, 2022

@jccarles Solution 1 has the upside (if it works) that you only need to deploy one Flink cluster and that can be used for all jobs. This is better if it works, but as I mentioned I had troubles with it.

Not that I recall actually. The issues I have had have usually been about one worker crashing due to some issue, usually data I believe, and then the other workers (processes) eventually fails with a grpc error.

Other issues that I’ve had has improved by updating beam and flink to the highest available versions.

I’m no expert but I can have a look if you post your flink charts.

1reaction

ConverJenscommented, Aug 30, 2022

@jccarles Actually, I created tfx python custom components. They are very similar to kubeflow components: https://www.tensorflow.org/tfx/guide/custom_function_component

To make this work I actually created a second pipeline which spawned the flink resources, started the TFX pipeline and waited for it to finish/crash and then tore down flink. In the event of a failure the resources were kept for a longer time period, say 24h.

Today I believe this could be implemented by using a custom TFX component that spins up flink and then you use the .add_downstream_component (similar to kubeflows .after) and then you add an exit handler which can take care of the graceful flink shutdown.

No worries! I had so much pain with this, so if I can spare someone a portion of this, then that is awsome!

Hope this helps and feel free to let me know if you have any other questions!

Top Results From Across the Web

Not running Beam job written on Java on Portable Flink runner

When I try to run the simplest PortableRunner pipeline on Java, I get the error: Exception in thread "main" java.lang.

Apache Flink Runner

The Apache Flink Runner can be used to execute Beam pipelines using Apache Flink. For execution you can choose between a cluster execution...

subject:"Using Flink" - The Mail Archive

For your setup3, as you are trying to use `flink run ...` command, it > will try to connect to a launched flink...

How to develop PyFlink API jobs from 0 to 1 - SegmentFault

Note: When using this method to execute a job, the job will be submitted to a remote YARN cluster. Example: ./bin/flink run --target...

python install beam, no module named apache_beam ...

A collection of random transforms for the Apache beam python SDK . ... --runner FlinkRunner # Running Beam Python on a distributed Flink...

Troubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.

Start Free

Top Related Reddit Thread

No results found

Top Related Tweet

No results found

Top Related Dev.to Post

No results found

Transform component, running beam with Flink Runner. Flink job fails with ModuleNotFoundError.

Issue Analytics

Top GitHub Comments

Top Results From Across the Web

Top Related Medium Post

Top Related StackOverflow Question

Troubleshoot Live Code

Top Related Reddit Thread

Top Related Hackernoon Post

Top Related Tweet

Top Related Dev.to Post

Top Related Hashnode Post

`mlmd.errors.InternalError` raised when duplicate entries found in metadata store.

Tuner intermittently failing