StatisticsGen cannot accept UnionChannel
See original GitHub issueSystem information
- Have I specified the code to reproduce the issue (Yes, No): Yes
- Environment in which the code is executed (e.g., Local(Linux/MacOS/Windows), Interactive Notebook, Google Cloud, etc): Linux, but I specified reproducible code with Colab.
- TensorFlow version:2.7.0
- TFX Version: 1.6.1
- Python version: 3.7.12
- Python dependencies (from
pip freeze
output): You can check this output in the below Colab link
Describe the current behavior
In the case of ingesting datasets from multiple data sources with the same scheme, I could merge multiple example channels using tfx.types.channel.union
. But after merging them, I couldn’t run StatisticsGen.
StatisticsGen cannot accept UnoinChannel. (tfx.types.channel.UnionChannel
)
https://github.com/tensorflow/tfx/blob/master/tfx/types/channel.py
Describe the expected behavior
StatisticsGen with UnionChannel inputs runs without any errors.
Standalone code to reproduce the issue
https://colab.research.google.com/drive/1J0FX9mdJHdGeRSOENxsk78m2gBn786o4?usp=sharing
Name of your Organization (Optional)
Other info / logs
I tested Transform and Trainer components with UnionChannel, and I could run them without any errors.
I think this is because Transform and Trainer handle examples as a list with variable length, but StatisticsGen handles examples like a list that has only one value(artifact_utils.get_single_instance).
Issue Analytics
- State:
- Created 2 years ago
- Reactions:1
- Comments:5 (4 by maintainers)
Top GitHub Comments
I believe that this is intentional limitation of StatisticsGen, because we thought every Examples artifact should be validated separately. @1025KB Do you have any other idea behind this restriction? (CC @caveness)
Unfortunately there’s no plan to support ForEach in LocalDagRunner nor KobeflowDagRunner.