question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

TFX with Dataflow Python Version Error

See original GitHub issue

When running a pipeline with Dataflow, BQExampleGen job always works, however, the statisticsGen, schemaGen, ExampleValidator always fails, saying the python version in the ‘setup.py’ is the issue. Here is a Colab file reproducing the issue.

This Beam issue briefly addresses the issue, however, upon testing the recommended Python versions >=3.7.4, I get an error that Dataflow requires Python 3.6.9. Is there any way to circumvent.

The exact error on the Dataflow logs appears as follows:

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.6/site-packages/dataflow_worker/batchworker.py", line 650, in do_work
    work_executor.execute()
  File "/usr/local/lib/python3.6/site-packages/dataflow_worker/executor.py", line 150, in execute
    test_shuffle_sink=self._test_shuffle_sink)
  File "/usr/local/lib/python3.6/site-packages/dataflow_worker/executor.py", line 116, in create_operation
    is_streaming=False)
  File "apache_beam/runners/worker/operations.py", line 932, in apache_beam.runners.worker.operations.create_operation
  File "apache_beam/runners/worker/operations.py", line 766, in apache_beam.runners.worker.operations.create_pgbk_op
  File "apache_beam/runners/worker/operations.py", line 822, in apache_beam.runners.worker.operations.PGBKCVOperation.__init__
  File "/usr/local/lib/python3.6/site-packages/apache_beam/internal/pickler.py", line 283, in loads
    return dill.loads(s)
  File "/usr/local/lib/python3.6/site-packages/dill/_dill.py", line 317, in loads
    return load(file, ignore)
  File "/usr/local/lib/python3.6/site-packages/dill/_dill.py", line 305, in load
    obj = pik.load()
  File "/usr/local/lib/python3.6/site-packages/dill/_dill.py", line 577, in _load_type
    return _reverse_typemap[name]
KeyError: 'ClassType'

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:27 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
luischinchillagarciacommented, Feb 4, 2020

I’ll try it now!

0reactions
stfines-clgxcommented, Sep 16, 2021

I had uninstalled cloudpickle as a test and I wind up with the same error

Read more comments on GitHub >

github_iconTop Results From Across the Web

Facing issues running tensorflow_io library on dataflow in a tfx ...
I am currently facing a related issue on dataflow when using tfx library. The tfx pipeline works fine locally but it fails on...
Read more >
python 2.7 - Problem with Tensorflow Transform(TFX ...
I have some problems running a Apache beam job on Dataflow. The code runs fine on a small dataset but when runing a...
Read more >
Using TFX inference with Dataflow for large scale ML ...
In this post, we walk through the use of the RunInference API from tfx-bsl, a utility transform from TensorFlow Extended (TFX), ...
Read more >
√ TFX Evaluator does not run in Dataflow so it fails due to lack of ...
I am running a pipeline in AI Platform pipelines based on TFX. All components run fine until the Evaluator. It just does not...
Read more >
Deep Dive into ML Models in Production Using TensorFlow ...
For this article, I'll use the Tensorflow 2.1 version with no GPU ... Install tfx and kfp Python packages. import sys !{sys.executable} -m ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found