question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Transform component, running beam with Flink Runner. Flink job fails with ModuleNotFoundError.

See original GitHub issue

System information

  • Have I specified the code to reproduce the issue (Yes, No): No
  • Environment in which the code is executed (e.g., Local(Linux/MacOS/Windows), Interactive Notebook, Google Cloud, etc): Linux
  • TensorFlow version: 2.5.3
  • TFX Version: 1.2.0
  • Python version: 3.7
  • Python dependencies (from pip freeze output):

Docker image for sdk workers built from

FROM apache/beam_python3.7_sdk:2.39.0 tfx==1.2.0 tensorflow==2.5.3

Describe the current behavior

Running a tfx pipeline with FlinkRunner for beam. The Transform component manages to start the flink job but it instantly fails with ModuleNotFoundError: No module named 'pipeline_functions'.

I wonder where can I find the code responsible for adding user code as --extra_package or actually installing the wheel which contains my user code ? I was not able to locate it in the repository…

Also I wonder if this comment is still “actual” (given my versions of tfx and beam) ?

Should I disable the wheel packaging and set force_tf_compat_v1=True in the Transform component ? If yes I wonder what are the implications of this ?

More globally I was hoping to start a discussion with potential other users of TFX who might have a similar setup, we are moving from running TFX pipelines with GCS for storage and Dataflow runners for Beam towards running TFX pipelines with HDFS for storage and Flink runners for Beam. The journey is tortuous and I was hoping to find potential other users who might wanna share their experience.

Thank you !

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:11 (6 by maintainers)

github_iconTop GitHub Comments

1reaction
ConverJenscommented, Aug 31, 2022

@jccarles Solution 1 has the upside (if it works) that you only need to deploy one Flink cluster and that can be used for all jobs. This is better if it works, but as I mentioned I had troubles with it.

Not that I recall actually. The issues I have had have usually been about one worker crashing due to some issue, usually data I believe, and then the other workers (processes) eventually fails with a grpc error.

Other issues that I’ve had has improved by updating beam and flink to the highest available versions.

I’m no expert but I can have a look if you post your flink charts.

1reaction
ConverJenscommented, Aug 30, 2022

@jccarles Actually, I created tfx python custom components. They are very similar to kubeflow components: https://www.tensorflow.org/tfx/guide/custom_function_component

To make this work I actually created a second pipeline which spawned the flink resources, started the TFX pipeline and waited for it to finish/crash and then tore down flink. In the event of a failure the resources were kept for a longer time period, say 24h.

Today I believe this could be implemented by using a custom TFX component that spins up flink and then you use the .add_downstream_component (similar to kubeflows .after) and then you add an exit handler which can take care of the graceful flink shutdown.

No worries! I had so much pain with this, so if I can spare someone a portion of this, then that is awsome!

Hope this helps and feel free to let me know if you have any other questions!

Read more comments on GitHub >

github_iconTop Results From Across the Web

Not running Beam job written on Java on Portable Flink runner
When I try to run the simplest PortableRunner pipeline on Java, I get the error: Exception in thread "main" java.lang.
Read more >
Apache Flink Runner
The Apache Flink Runner can be used to execute Beam pipelines using Apache Flink. For execution you can choose between a cluster execution...
Read more >
subject:"Using Flink" - The Mail Archive
For your setup3, as you are trying to use `flink run ...` command, it > will try to connect to a launched flink...
Read more >
How to develop PyFlink API jobs from 0 to 1 - SegmentFault
Note: When using this method to execute a job, the job will be submitted to a remote YARN cluster. Example: ./bin/flink run --target...
Read more >
python install beam, no module named apache_beam ...
A collection of random transforms for the Apache beam python SDK . ... --runner FlinkRunner # Running Beam Python on a distributed Flink...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found