question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

TFX Transform component reports ValueError "The preprocessing function returned an empty dict"

See original GitHub issue

Transform component in TFX pipeline reports ValueError “The preprocessing function returned an empty dict”. Sometimes the error disappears if I re-run the same pipeline without any changes. Recently it happens more frequently for some reason.

Here’s the preprocessing function

def preprocess_eval(input_features):
    return input_features

Here’s traceback

Traceback (most recent call last):
  File "local_runner.py", line 115, in <module>
    run()
  File "local_runner.py", line 110, in run
    enable_cache=False))
  File "/Users/gxiao/miniconda3/envs/lingo/lib/python3.7/site-packages/tfx/orchestration/local/local_dag_runner.py", line 129, in run
    component_launcher.launch()
  File "/Users/gxiao/miniconda3/envs/lingo/lib/python3.7/site-packages/tfx/orchestration/portable/launcher.py", line 412, in launch
    executor_output = self._run_executor(execution_info)
  File "/Users/gxiao/miniconda3/envs/lingo/lib/python3.7/site-packages/tfx/orchestration/portable/launcher.py", line 306, in _run_executor
    executor_output = self._executor_operator.run_executor(execution_info)
  File "/Users/gxiao/miniconda3/envs/lingo/lib/python3.7/site-packages/tfx/orchestration/portable/python_executor_operator.py", line 112, in run_executor
    execution_info.exec_properties)
  File "/Users/gxiao/miniconda3/envs/lingo/lib/python3.7/site-packages/tfx/components/transform/executor.py", line 466, in Do
    self.Transform(label_inputs, label_outputs, status_file)
  File "/Users/gxiao/miniconda3/envs/lingo/lib/python3.7/site-packages/tfx/components/transform/executor.py", line 985, in Transform
    len(analyze_data_paths))
  File "/Users/gxiao/miniconda3/envs/lingo/lib/python3.7/site-packages/tfx/components/transform/executor.py", line 1119, in _RunBeamImpl
    preprocessing_fn, pipeline=pipeline))
  File "/Users/gxiao/.local/lib/python3.7/site-packages/apache_beam/transforms/ptransform.py", line 1058, in __ror__
    return self.transform.__ror__(pvalueish, self.label)
  File "/Users/gxiao/.local/lib/python3.7/site-packages/apache_beam/transforms/ptransform.py", line 573, in __ror__
    result = p.apply(self, pvalueish, label)
  File "/Users/gxiao/.local/lib/python3.7/site-packages/apache_beam/pipeline.py", line 646, in apply
    return self.apply(transform, pvalueish)
  File "/Users/gxiao/.local/lib/python3.7/site-packages/apache_beam/pipeline.py", line 689, in apply
    pvalueish_result = self.runner.apply(transform, pvalueish, self._options)
  File "/Users/gxiao/.local/lib/python3.7/site-packages/apache_beam/runners/runner.py", line 188, in apply
    return m(transform, input, options)
  File "/Users/gxiao/.local/lib/python3.7/site-packages/apache_beam/runners/runner.py", line 218, in apply_PTransform
    return transform.expand(input)
  File "/Users/gxiao/miniconda3/envs/lingo/lib/python3.7/site-packages/tensorflow_transform/beam/impl.py", line 1182, in expand
    self).expand(self._make_parent_dataset(dataset))
  File "/Users/gxiao/miniconda3/envs/lingo/lib/python3.7/site-packages/tensorflow_transform/beam/impl.py", line 1049, in expand
    raise ValueError('The preprocessing function returned an empty dict')
ValueError: The preprocessing function returned an empty dict

Try to check the source code and it seems to be related to structured_outputs of ConcreteFunction class is empty for some reason. Any idea how to fix it?

Issue Analytics

  • State:open
  • Created 2 years ago
  • Comments:11 (6 by maintainers)

github_iconTop GitHub Comments

3reactions
EdwardCuiPeacockcommented, Oct 30, 2021

Update: turns out lifecycle_stage: DEPRECATED was the problem in the schema. One of the ExampleGen splits is empty, which makes the SchemaGen mark all the field as empty.

2reactions
glenxiaocommented, Oct 30, 2021

In my case the output of ExampleGen is not empty either, if I remember correctly (it was a few months ago). My original guess is SchemaGen may not work well with a relatively small input. Thank you so much for sharing your case about lifecycle_stage @EdwardCuiPeacock . Will definitely try it later.

Read more comments on GitHub >

github_iconTop Results From Across the Web

tfx.v1.components.Transform - TensorFlow
The Transform component wraps TensorFlow Transform (tf.Transform) to preprocess data in a TFX pipeline. This component will load the preprocessing_fn from ...
Read more >
Prerequisites - Amazon SageMaker - AWS Documentation
To create a SageMaker Neo-compiled model, you need the following:
Read more >
Python Examples of apache_beam.PTransform
PTransform: """Returns PTransform for converting input source to records. ... need to exist in the Evaluation or Validation, but the dict must not...
Read more >
2020-May.txt - Python mailing list
The preceding part of the error message shows the context where the exception -happened, in the form of a ... return dict; diff...
Read more >
Bug listing with status UNCONFIRMED as at 2022/12/22 02 ...
Originally part of the Tornado framework" status:UNCONFIRMED resolution: severity: ... Bug:498416 - "net-misc/return - a highly efficient C++ open-source ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found