[TPU, keras preprocessing layer] Some Op must be a compile-time constant.
See original GitHub issuePlease go to TF Forum for help and support:
https://discuss.tensorflow.org/tag/keras
If you open a GitHub issue, here is our policy:
It must be a bug, a feature request, or a significant problem with the documentation (for small docs fixes please send a PR instead). The form below must be filled out.
Here’s why we have that policy:.
Keras developers respond to issues. We want to focus on work that benefits the whole community, e.g., fixing bugs and adding features. Support only helps individuals. GitHub also notifies thousands of people when issues are filed. We want them to see you communicating an interesting problem, rather than being redirected to Stack Overflow.
System information.
- Have I written custom code (as opposed to using a stock example script provided in TensorFlow): y
- OS Platform and Distribution (e.g., Linux Ubuntu 16.04): google colab
- TensorFlow installed from (source or binary): google colab
- TensorFlow version (use command below): 2.7
- Python version: google colab
- Bazel version (if compiling from source):
- GPU model and memory: TPU issue
- Exact command to reproduce: Here is a COLAB notebook! You could reproduce this issue without any other codes. Just change COLAB runtime to TPU device, and run all cells.
You can collect some of this information using our environment capture script:
https://github.com/tensorflow/tensorflow/tree/master/tools/tf_env_collect.sh
You can obtain the TensorFlow version with: python -c “import tensorflow as tf; print(tf.version.GIT_VERSION, tf.version.VERSION)”
Describe the problem.
Hi! TPU error raises especially with Kears preprocessing layers. I’ve tried to connect two models, augmentation model that contains preprocessing layer and segmentation model.
def new_concatenated_model(
image_input_hw,
mask_input_hw,
class_n
):
seg_model = create_segmentation_model(class_n)
aug_model = create_augmentation_model(
image_input_hw, mask_input_hw, class_n)
image_input_shape = list(image_input_hw) + [3]
@auto_tpu(device=CURRENT_DEVICE) # decorator `auto_tpu` is just context manager.
def create():
im = seg_model.input
model = AugConcatedSegModel(
inputs=im,
outputs=seg_model(im),
augmentation_model=aug_model,
name='seg_model_train_with_aug'
)
return model
model = create()
return model
train_step()
function code was mainly came from tensorflow official tutorial document.
class AugConcatedSegModel(tf.keras.Model):
def __init__(
self,
inputs=None,
outputs=None,
augmentation_model=None,
**kwargs
):
super().__init__(inputs=inputs, outputs=outputs, **kwargs)
self.augmentation_model = augmentation_model
def train_step(self, data):
im, ma = data
im, ma = self.augmentation_model((im, ma))
with tf.GradientTape() as tape:
ma_pred = self(im, training=True) # Forward pass
# Compute the loss value
# (the loss function is configured in `compile()`)
loss = self.compiled_loss(ma, ma_pred, regularization_losses=self.losses)
# Compute gradients
trainable_vars = self.trainable_variables
gradients = tape.gradient(loss, trainable_vars)
# Update weights
self.optimizer.apply_gradients(zip(gradients, trainable_vars))
# Update metrics (includes the metric that tracks the loss)
self.compiled_metrics.update_state(ma, ma_pred)
# Return a dict mapping metric names to current value
return {m.name: m.result() for m in self.metrics}
Describe the current behavior.
Expected to train successfully without error. same code were tested on:
- CPU : No errors
- GPU : No errors
- TPU : Error
You could reproduce this error very fast
https://colab.research.google.com/drive/1LhHj1FrkZE9QnFhY-NOO8mn7aiXhZgNh?usp=sharing
Runtime - Run all
.
- When I changed
augmentation model
to just plainConv2D
layers, the error disappeared.
Describe the expected behavior.
The COLAB notebook runs without any error.
- Do you want to contribute a PR? (yes/no):
- If yes, please read this page for instructions
- Briefly describe your candidate solution(if contributing):
Standalone code to reproduce the issue.
https://colab.research.google.com/drive/1LhHj1FrkZE9QnFhY-NOO8mn7aiXhZgNh?usp=sharing
Source code / logs.
Include any logs or source code that would be helpful to diagnose the problem. If including tracebacks, please include the full traceback. Large logs and files should be attached. Try to provide a reproducible test case that is the bare minimum necessary to generate the problem.
InvalidArgumentError: 9 root error(s) found.
(0) INVALID_ARGUMENT: {{function_node __inference_train_function_692915}} Input 0 to node `sequential_augmentation_model/sequential_augmentation_layers/random_flip/stateless_random_flip_left_right/stateless_random_uniform/StatelessRandomUniformV2` with op StatelessRandomUniformV2 must be a compile-time constant.
XLA compilation requires that operator arguments that represent shapes or dimensions be evaluated to concrete values at compile time. This error means that a shape or dimension argument could not be evaluated at compile time, usually because the value of the argument depends on a parameter to the computation, on a variable, or on a stateful operation such as a random number generator.
[[{{node sequential_augmentation_model/sequential_augmentation_layers/random_flip/stateless_random_flip_left_right/stateless_random_uniform/StatelessRandomUniformV2}}]]
[[TPUReplicate/_compile/_1646634736830564460/_4]]
(1) INVALID_ARGUMENT: {{function_node __inference_train_function_692915}} Input 0 to node `sequential_augmentation_model/sequential_augmentation_layers/random_flip/stateless_random_flip_left_right/stateless_random_uniform/StatelessRandomUniformV2` with op StatelessRandomUniformV2 must be a compile-time constant.
XLA compilation requires that operator arguments that represent shapes or dimensions be evaluated to concrete values at compile time. This error means that a shape or dimension argument could not be evaluated at compile time, usually because the value of the argument depends on a parameter to the computation, on a variable, or on a stateful operation such as a random number generator.
[[{{node sequential_augmentation_model/sequential_augmentation_layers/random_flip/stateless_random_flip_left_right/stateless_random_uniform/StatelessRandomUniformV2}}]]
[[TPUReplicate/_compile/_1646634736830564460/_4]]
[[tpu_compile_succeeded_assert/_5094882425795608634/_5/_47]]
(2) INVALID_ARGUMENT: {{function_node __inference_train_function_692915}} Input 0 to node `sequential_augmentation_model/sequential_augmentation_layers/random_flip/stateless_random_flip_left_right/stateless_random_uniform/StatelessRandomUniformV2` with op StatelessRandomUniformV2 must be a compile-time constant.
XLA compilation requires that operator arguments that represent shapes or dimensions be evaluated to concrete values at compile time. This error means that a shape or dimension argument could not be evaluated at compile time, usually because the value of the argument depends on a parameter to the computation, on a variable, or on a stateful operation such as a random number generator.
[[{{node sequential_augmentation_model/sequential_augmentation_layers/random_flip/stateless_random_flip_left_right/stateless_random_uniform/StatelessRandomUniformV2}}]]
[[TPUReplicate/_compile/_1646634736830564460/_4]]
[[tpu_compile_succeeded_assert/_5094882425795608634/_5/_159]]
(3) INVALID_ARGUMENT: {{function_node __inference_train_function_692915}} Input 0 to node `sequential_augmentation_model/sequential_a ... [truncated]
NOTE: This issue is came from https://github.com/tensorflow/tensorflow/issues/53051
Issue Analytics
- State:
- Created 2 years ago
- Comments:9 (6 by maintainers)
Took a look. There may something we need to change in how we set up randomness for the RandomFlip layer, I will check about this.
Overall, I think preferred approach here should be to apply the preprocessing layers inside a
tf.data.Dataset.map
before the training step. This would keep all preprocessing running on the CPU asynchronously, which should be more efficient in this case.See this section of our guide which explains the choice: https://keras.io/guides/preprocessing_layers/#preprocessing-data-before-the-model-or-inside-the-model
Note that the RandomRotation underlying op does not have TPU support (why you have
tf.config.set_soft_device_placement(True)
) so you will be running partially on the CPU anyway.This blogpost also shows an example of running preprocessing separately tf.data and prefetching: https://blog.tensorflow.org/2021/11/an-introduction-to-keras-preprocessing.html
Are you satisfied with the resolution of your issue? Yes No