TFX transform: decode an encrypted .tfrecord file
See original GitHub issueThe use case may sound weird. What I want is, in the transform component call a third party function (that came from cryptography package) on the feature (previously encrypted) contained in the tf record file. To do it I use your tft.apply_pyfunc (seems to call function tf.py_func deprecated in tf 2.0 instead of the new tf.py_function). Is it the right way to do it ?
def preprocessing_fn(inputs, custom_config):
outputs = {}
def apply_decrypt(value):
return decrypt(value, str.encode(CRYPTO_KEY))
def smart_decode(x, shape):
decrypted = tft.apply_pyfunc(apply_decrypt, tf.string, True, "decrypt", x) #This line is relevant
decrypted = tf.reshape(decrypted, shape=tf.shape(x))
decoded = tf.cond(
tf.image.is_jpeg(decrypted),
lambda: tf.image.decode_jpeg(decrypted, channels=3),
lambda: tf.image.decode_png(decrypted, channels=3))
resized = tf.image.resize(decoded, shape)
casted = tf.dtypes.cast(resized, tf.uint8)
return casted
image_features = tf.map_fn(
lambda x : smart_decode(x[0], custom_config["input_shape"]),
inputs[_IMAGE_KEY],
dtype=tf.uint8)
image_features = tf.dtypes.cast(image_features, tf.float32)
outputs[_transformed_name(_IMAGE_KEY)] = image_features
classes_nb = len(custom_config["labels"])
labels = tf.one_hot(inputs[_LABEL_KEY], classes_nb)
labels = tf.reshape(labels, shape=(-1, classes_nb))
outputs[_transformed_name(_LABEL_KEY)] = labels
return outputs
What I understand is that preprocessing_fn may be seralized and thus, apply_decrypt may be lost in the process. apply_decrypt is never called.
Here is the error I recieve:
ValueError: callback pyfunc_5 is not found
[[{{node decrypt}}]]".
Batch instances: pyarrow.RecordBatch
image: large_list<item: large_binary>
child 0, item: large_binary
abel: large_list<item: int64>
child 0, item: int64,
Fetching the values for the following Tensor keys: ['image_xf', 'label_xf']. [while running 'Transform[TransformIndex0]/Transform']
Also, there is a long text in the apply_pyfunc documentation that I’m not sure to understand well, it may be linked.
Thanks. (for saving the lonely intern again)
docs : https://www.tensorflow.org/tfx/transform/api_docs/python/tft/apply_pyfunc
Issue Analytics
- State:
- Created 3 years ago
- Comments:6 (1 by maintainers)
Top GitHub Comments
Tensorflow code is excuted by C++. tft.apply_pyfunc is a wrapper around tf.py_func. I feel that during unpickling(deserialization), “transform_raw_features” requires a pythonic environment which tf doesn’t run on. Hence the error. To overcome it , I explored tf.strings for an equivalent encode/decode function but currently this is not supported.
Can it thus be said that one cannot use
tf.py_function
(from TF 2.x) insidepreprocessing_fn
when using TFX Transform?I am facing a related issue as mentioned in the StackOverflow question over here.