question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

TFX transform: decode an encrypted .tfrecord file

See original GitHub issue

The use case may sound weird. What I want is, in the transform component call a third party function (that came from cryptography package) on the feature (previously encrypted) contained in the tf record file. To do it I use your tft.apply_pyfunc (seems to call function tf.py_func deprecated in tf 2.0 instead of the new tf.py_function). Is it the right way to do it ?

def preprocessing_fn(inputs, custom_config):
  outputs = {}

  def apply_decrypt(value):
    return decrypt(value, str.encode(CRYPTO_KEY))

  def smart_decode(x, shape):

    decrypted = tft.apply_pyfunc(apply_decrypt, tf.string, True, "decrypt", x)  #This line is relevant
    decrypted = tf.reshape(decrypted, shape=tf.shape(x))                              

    decoded =  tf.cond(
    tf.image.is_jpeg(decrypted),
    lambda: tf.image.decode_jpeg(decrypted, channels=3),
    lambda: tf.image.decode_png(decrypted, channels=3))
    resized = tf.image.resize(decoded, shape)
    casted = tf.dtypes.cast(resized, tf.uint8)
    return casted

  image_features = tf.map_fn(
  lambda x : smart_decode(x[0], custom_config["input_shape"]),
  inputs[_IMAGE_KEY],
  dtype=tf.uint8)

  image_features = tf.dtypes.cast(image_features, tf.float32)
  outputs[_transformed_name(_IMAGE_KEY)] = image_features
  classes_nb = len(custom_config["labels"])
  labels = tf.one_hot(inputs[_LABEL_KEY], classes_nb)
  labels = tf.reshape(labels, shape=(-1, classes_nb))
  outputs[_transformed_name(_LABEL_KEY)] = labels
  return outputs

What I understand is that preprocessing_fn may be seralized and thus, apply_decrypt may be lost in the process. apply_decrypt is never called.

Here is the error I recieve:

 ValueError: callback pyfunc_5 is not found
 [[{{node decrypt}}]]".
Batch instances: pyarrow.RecordBatch
image: large_list<item: large_binary>
child 0, item: large_binary
abel: large_list<item: int64>
child 0, item: int64,
Fetching the values for the following Tensor keys: ['image_xf', 'label_xf']. [while running 'Transform[TransformIndex0]/Transform']

Also, there is a long text in the apply_pyfunc documentation that I’m not sure to understand well, it may be linked.

Thanks. (for saving the lonely intern again)

docs : https://www.tensorflow.org/tfx/transform/api_docs/python/tft/apply_pyfunc

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:6 (1 by maintainers)

github_iconTop GitHub Comments

2reactions
Arghya999commented, Dec 3, 2020

Tensorflow code is excuted by C++. tft.apply_pyfunc is a wrapper around tf.py_func. I feel that during unpickling(deserialization), “transform_raw_features” requires a pythonic environment which tf doesn’t run on. Hence the error. To overcome it , I explored tf.strings for an equivalent encode/decode function but currently this is not supported.

0reactions
jashshahcommented, Jun 10, 2021

Can it thus be said that one cannot use tf.py_function(from TF 2.x) inside preprocessing_fn when using TFX Transform?

I am facing a related issue as mentioned in the StackOverflow question over here.

Read more comments on GitHub >

github_iconTop Results From Across the Web

TFRecord and tf.train.Example | TensorFlow Core
Example is just a method of serializing dictionaries to byte-strings. Any byte-string that can be decoded in TensorFlow could be stored in a...
Read more >
How to make a tf.transform (Tensorflow Transform) encoded ...
I need a dict like this: " a dict of the data you load ({feature_name: feature_value})." Transform as mentioned above gives me a...
Read more >
TFRecords Explained - Towards Data Science
I will convert all the records of a dataset to TFRecords which can be serialized into binary and can be written in a...
Read more >
Using TFRecords and tf.Example - | notebook.community
Example messages to and from .tfrecord files. Note: While useful, these structures are optional. There is no need to convert existing code to...
Read more >
Data Ingestion with TensorFlow eXtended (TFX)
Generating TFRecord from CSV files. The basic example of using the ExampleGen component to generate TFRecords is with local CSV files as inputs:....
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found