question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

StatisticsGen cannot accept UnionChannel

See original GitHub issue

System information

  • Have I specified the code to reproduce the issue (Yes, No): Yes
  • Environment in which the code is executed (e.g., Local(Linux/MacOS/Windows), Interactive Notebook, Google Cloud, etc): Linux, but I specified reproducible code with Colab.
  • TensorFlow version:2.7.0
  • TFX Version: 1.6.1
  • Python version: 3.7.12
  • Python dependencies (from pip freeze output): You can check this output in the below Colab link

Describe the current behavior

In the case of ingesting datasets from multiple data sources with the same scheme, I could merge multiple example channels using tfx.types.channel.union. But after merging them, I couldn’t run StatisticsGen.

StatisticsGen cannot accept UnoinChannel. (tfx.types.channel.UnionChannel) https://github.com/tensorflow/tfx/blob/master/tfx/types/channel.py

Describe the expected behavior

StatisticsGen with UnionChannel inputs runs without any errors.

Standalone code to reproduce the issue

https://colab.research.google.com/drive/1J0FX9mdJHdGeRSOENxsk78m2gBn786o4?usp=sharing

Name of your Organization (Optional)

Other info / logs

I tested Transform and Trainer components with UnionChannel, and I could run them without any errors.

I think this is because Transform and Trainer handle examples as a list with variable length, but StatisticsGen handles examples like a list that has only one value(artifact_utils.get_single_instance).

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Reactions:1
  • Comments:5 (4 by maintainers)

github_iconTop GitHub Comments

2reactions
jiyongjung0commented, Feb 15, 2022

I believe that this is intentional limitation of StatisticsGen, because we thought every Examples artifact should be validated separately. @1025KB Do you have any other idea behind this restriction? (CC @caveness)

1reaction
chongkongcommented, Feb 22, 2022

Unfortunately there’s no plan to support ForEach in LocalDagRunner nor KobeflowDagRunner.

Read more comments on GitHub >

github_iconTop Results From Across the Web

tfx.v1.components.StatisticsGen - TensorFlow
The StatisticsGen component generates features statistics and random samples over training data, which can be used for visualization and validation.
Read more >
TFX StatisticsGen for image data - tensorflow - Stack Overflow
I'm trying to use StatisticsGen but I'm receiving this warning; WARNING:root:Feature "image_raw" has bytes value "None" which cannot be ...
Read more >
tfx - bytemeta
StatisticsGen cannot accept UnionChannel. MsAlEhR ... LocalDagRunner can't run simple Pipeline with FileBasedExampleGen and custom parquet Executor.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found