Schema and statistics in Transform TFX Pipeline Component
See original GitHub issueThe main entry point in Transform component (preprocessing_fn
) should also provide computed Stats and Schema next to the inputs. In some scenarios users might want to benefit from the statistics e.g. to eliminate unnecessary features.
Issue Analytics
- State:
- Created 4 years ago
- Reactions:1
- Comments:5 (1 by maintainers)
Top Results From Across the Web
The Transform TFX Pipeline Component - TensorFlow
The Transform TFX pipeline component performs feature engineering on tf.Examples emitted from an ExampleGen component, using a data schema created by a ...
Read more >TFX standard data components - Introduction to TFX Pipelines
This allows your pipeline to scale data set statistical summaries as your data grows, with built-in logging, and fault tolerance for debugging.
Read more >TFX Components Walk-through - | notebook.community
The Transform component performs data transformation and feature engineering. The Transform component consumes tf.Examples emitted from the ExampleGen component ...
Read more >How to use the tfx.components.base.executor_spec ... - Snyk
Performs anomaly detection based on statistics and data schema. ... In a typical TFX pipeline, the SchemaGen component generates a schema which is...
Read more >https://raw.githubusercontent.com/kubeflow/pipelin...
The Transform component wraps TensorFlow Transform (tf.Transform) to preprocess data in a TFX pipeline. This component will load the preprocessing_fn from ...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
I’d like to re-up this request—having easier access to the metadata and schema from within the Transform callback would be tremendously useful. Some feature transformations could depend on the inferred schema. Is there some other workaround?
@zoyahav Indeed. Currently in the TFX Transform Pipeline Component the
processing_fn
has a signature ofand I am proposing to change it as follows:
or even better, as a higher order function / factory:
So now my processing function could depend on the stats and schema. I know I could reach for it Stats and Schema manually, yet it requires me to talk to the SQLite database and keeping track of paths, etc.