TFX >= 1.4.0 fails with S3 as backend due to tensorflow-io not being imported
See original GitHub issueTFX >= 1.4.0 fails with S3 as backend due to tensorflow-io not being imported. Up to tensorflow 2.5.*, the other filesystems was a part of tensorflow but from TF 2.6 this has been moved to tf-io. However, tf io isn’t imported in tfx/orchestration/kubeflow/container_entrypoint.py
and hence, S3 (and several other) filesystem can’t be used.
- Have I specified the code to reproduce the issue (Yes, No): No
- Environment in which the code is executed (e.g., Local(Linux/MacOS/Windows), Interactive Notebook, Google Cloud, etc): KubeFlow, Ubuntu image
- TensorFlow version: 2.7
- TFX Version: 1.5
- Python version: 3.7
- Python dependencies (from
pip freeze
output):
Describe the current behavior TFX >= 1.4.0 fails with S3 as backend due to tensorflow-io not being imported
Describe the expected behavior S3 filesystem should work.
Standalone code to reproduce the issue Any simple pipeline which uses s3 as storage backend.
Other info / logs
INFO:absl:Going to run a new execution 27735
Traceback (most recent call last):
File "/usr/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/usr/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/root/pyenv/lib/python3.7/site-packages/tfx/orchestration/kubeflow/container_entrypoint.py", line 476, in <module>
main(sys.argv[1:])
File "/root/pyenv/lib/python3.7/site-packages/tfx/orchestration/kubeflow/container_entrypoint.py", line 468, in main
execution_info = component_launcher.launch()
File "/root/pyenv/lib/python3.7/site-packages/tfx/orchestration/portable/launcher.py", line 524, in launch
execution_preparation_result = self._prepare_execution()
File "/root/pyenv/lib/python3.7/site-packages/tfx/orchestration/portable/launcher.py", line 384, in _prepare_execution
self._output_resolver.get_executor_output_uri(execution.id)),
File "/root/pyenv/lib/python3.7/site-packages/tfx/orchestration/portable/outputs_utils.py", line 169, in get_executor_output_uri
fileio.makedirs(execution_dir)
File "/root/pyenv/lib/python3.7/site-packages/tfx/dsl/io/fileio.py", line 80, in makedirs
_get_filesystem(path).makedirs(path)
File "/root/pyenv/lib/python3.7/site-packages/tfx/dsl/io/plugins/tensorflow_gfile.py", line 71, in makedirs
tf.io.gfile.makedirs(path)
File "/root/pyenv/lib/python3.7/site-packages/tensorflow/python/lib/io/file_io.py", line 515, in recursive_create_dir_v2
_pywrap_file_io.RecursivelyCreateDir(compat.path_to_bytes(path))
tensorflow.python.framework.errors_impl.UnimplementedError: File system scheme 's3' not implemented (file: 's3://pipelines/tfx/trace_model_pipeline/TimeBasedExampleGen/.system/executor_execution/27735')
Issue Analytics
- State:
- Created 2 years ago
- Reactions:4
- Comments:44 (25 by maintainers)
Top Results From Across the Web
Unable to connect to endpoint when writing to S3 using ...
Tensorflow 1.4.0 comes with the S3 filesystem driver by default. I'm having trouble using it, and have this minimal example, that does not...
Read more >tfx Changelog - pyup.io
* Importer will now use the most recently created artifact when reusing an existing artifact instead of the one with the highest ID....
Read more >pzthon requirement.parse('protobuf<4.0.0dev,>=3.19.0 ...
Problem : I am unable to use tfdv with poetry due to dependencies not being resolved. For simplicity and debugging purposes, below are...
Read more >Bug listing with status UNCONFIRMED as at 2022/12/22 02 ...
... fails to emerge due to missing jkweb.a" status:UNCONFIRMED resolution: ... Bug:504308 - "emerge being not very helpful with slot conflicts caused by...
Read more >Automatic Titanic Survival Prediction | Kaggle
import numpy as np # linear algebra import pandas as pd # data processing, ... pyarrow 6.0.1 which is incompatible. tfx-bsl 1.4.0 requires...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
The root cause is https://github.com/tensorflow/tensorflow/issues/51583. TF dropped s3 / HDFS support from 2.6 and I believe that all our packages are affected by this. We could support s3 by importing tensorflow_io dependency in the repo.
This fix could be potentially included in next release.
@varshaan Mostly great news: the issue in tf Transform seems to be resolved!
@jiyongjung0 Slightly worse news: similar issue is still present in Evaluator (see log below). Can someone look at this ASAP? This is the issue as in Transform before so a simple
import tensorflow_io
will probably do the trick.This is the final component so when this is fixed, TFX is officially S3 certified again.