{SPAN} doesn't work as expected with GCS
See original GitHub issueSystem information
- Have I specified the code to reproduce the issue (Yes, No): Yes
- Environment in which the code is executed (e.g., Local(Linux/MacOS/Windows), Interactive Notebook, Google Cloud, etc): Vertex AI Pipeline, Vertex AI Notebook, GCS storage
- TensorFlow version: 2.6
- TFX Version: 1.2.0
- Python version: 3.7
- Python dependencies (from
pip freeze
output): None
Describe the current behavior First of all, I have CIFAR10 dataset in the following location
gs://cifar10-csp-public/cifar10/span-1/train/train.tfrecord
gs://cifar10-csp-public/cifar10/span-1/test/test.tfrecord
With ImportExampleGen
as defined below, it failed to get the dataset from the specified pattern
paths.
data_path = "gs://cifar10-csp-public"
input_config = example_gen_pb2.Input(splits=[
example_gen_pb2.Input.Split(name='train',
pattern='cifar10/span-{SPAN}/train/*'),
example_gen_pb2.Input.Split(name='val',
pattern='cifar10/span-{SPAN}/test/*')
])
example_gen = tfx.components.ImportExampleGen(input_base=data_path, input_config=input_config)
As inspecting the logs, it complains the files don’t exist.
OSError: No files found based on the file pattern gs://cifar10-csp-public/cifar10/span-{SPAN}/train/*
Describe the expected behavior
The expected behaviour is that ImportExampleGen
can correctly retrieve the data with {SPAN}
specified. As it didn’t work as expected, I have tried out the code below
data_path = "gs://cifar10-csp-public"
splits = [
example_gen_pb2.Input.Split(name='train',pattern='span-{SPAN}/train/*'),
example_gen_pb2.Input.Split(name='val',pattern='span-{SPAN}/test/*')
]
_, span, version = utils.calculate_splits_fingerprint_span_and_version(data_path, splits)
input_config = example_gen_pb2.Input(splits=[
example_gen_pb2.Input.Split(name='train', pattern=f'span-{span}/train/*'),
example_gen_pb2.Input.Split(name='val', pattern=f'span-{span}/test/*')
])
example_gen = tfx.components.ImportExampleGen(input_base=data_path, input_config=input_config)
With the utility function calculate_splits_fingerprint_span_and_version
, it works fine now. However, I just wonder why it didn’t work in the first place. Doesn’t ImportExampleGen
use calculate_splits_fingerprint_span_and_version
function internally?
Issue Analytics
- State:
- Created 2 years ago
- Comments:73 (3 by maintainers)
Top Results From Across the Web
Troubleshoot External HTTP(S) Load Balancing - Google Cloud
URL doesn't serve expected Cloud Storage object. The Cloud Storage object to serve is determined based on your URL map and the URL...
Read more >Does height and width not apply to span? - Stack Overflow
Span is an inline element. It has no width or height. You could turn it into a block-level element, then it will accept...
Read more >airflow.providers.google.cloud.operators.gcs
The time-span is passed to the transform script as third and fourth argument as UTC ISO 8601 string. The transformation script is expected...
Read more >An Introduction to Optimising Code Using Span<T>
In this post I introduce Span for high-performance C# code situations. ... As we don't specify a length, this slice will run to...
Read more >Spans and ref part 2 : spans - Marc Gravell
work with generics (unlike pointers, which don't); respect garbage collection (GC) semantics by using references instead of pointers (the GC ...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
@deep-diver @1025KB The nightly build passed yesterday(Or should we call it today for KST? 😄 ).
https://hub.docker.com/layers/tensorflow/tfx/1.4.0.dev20211013/images/sha256-7759ea95c4a83c4e2b8d210994ca4c15556d13cc2c2c83b57f943e9a1e444d01?context=explore
FYI, https://github.com/tensorflow/tfx/pull/4347 this PR should fix the {SPAN} for Vertex (KubeflowV2DagRunner)