TFX with Dataflow Python Version Error
See original GitHub issueWhen running a pipeline with Dataflow, BQExampleGen job always works, however, the statisticsGen, schemaGen, ExampleValidator always fails, saying the python version in the ‘setup.py’ is the issue. Here is a Colab file reproducing the issue.
This Beam issue briefly addresses the issue, however, upon testing the recommended Python versions >=3.7.4, I get an error that Dataflow requires Python 3.6.9. Is there any way to circumvent.
The exact error on the Dataflow logs appears as follows:
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/dataflow_worker/batchworker.py", line 650, in do_work
work_executor.execute()
File "/usr/local/lib/python3.6/site-packages/dataflow_worker/executor.py", line 150, in execute
test_shuffle_sink=self._test_shuffle_sink)
File "/usr/local/lib/python3.6/site-packages/dataflow_worker/executor.py", line 116, in create_operation
is_streaming=False)
File "apache_beam/runners/worker/operations.py", line 932, in apache_beam.runners.worker.operations.create_operation
File "apache_beam/runners/worker/operations.py", line 766, in apache_beam.runners.worker.operations.create_pgbk_op
File "apache_beam/runners/worker/operations.py", line 822, in apache_beam.runners.worker.operations.PGBKCVOperation.__init__
File "/usr/local/lib/python3.6/site-packages/apache_beam/internal/pickler.py", line 283, in loads
return dill.loads(s)
File "/usr/local/lib/python3.6/site-packages/dill/_dill.py", line 317, in loads
return load(file, ignore)
File "/usr/local/lib/python3.6/site-packages/dill/_dill.py", line 305, in load
obj = pik.load()
File "/usr/local/lib/python3.6/site-packages/dill/_dill.py", line 577, in _load_type
return _reverse_typemap[name]
KeyError: 'ClassType'
Issue Analytics
- State:
- Created 4 years ago
- Comments:27 (3 by maintainers)
Top Results From Across the Web
Facing issues running tensorflow_io library on dataflow in a tfx ...
I am currently facing a related issue on dataflow when using tfx library. The tfx pipeline works fine locally but it fails on...
Read more >python 2.7 - Problem with Tensorflow Transform(TFX ...
I have some problems running a Apache beam job on Dataflow. The code runs fine on a small dataset but when runing a...
Read more >Using TFX inference with Dataflow for large scale ML ...
In this post, we walk through the use of the RunInference API from tfx-bsl, a utility transform from TensorFlow Extended (TFX), ...
Read more >√ TFX Evaluator does not run in Dataflow so it fails due to lack of ...
I am running a pipeline in AI Platform pipelines based on TFX. All components run fine until the Evaluator. It just does not...
Read more >Deep Dive into ML Models in Production Using TensorFlow ...
For this article, I'll use the Tensorflow 2.1 version with no GPU ... Install tfx and kfp Python packages. import sys !{sys.executable} -m ......
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
I’ll try it now!
I had uninstalled cloudpickle as a test and I wind up with the same error