WeightNormalization doesn't work with @tf.function in an edge case
See original GitHub issueSystem information
- OS Platform and Distribution (e.g., Linux Ubuntu 16.04): macOS 10.15.2 (19C57)
- TensorFlow version and how it was installed (source or binary): 2.1.0 via pip
- TensorFlow-Addons version and how it was installed (source or binary): 0.8.0-dev via pip
- Python version: 3.7.4
- Is GPU used? (yes/no): no
Describe the bug
When the train function is decorated with @tf.function, using multiple WeightNormalization layers in a multi-output model will cause the training process hanging on without any meaningful output logs.
The model would work if:
- Comment out the @tf.function
- Using only one WeightNormalization layer
- Using only one output in the InnerModel
I don’t really know what happen, I post the full code to reproduce the issue. The issue also reproducible on Linux (ubuntu 16.04) with GPU (RTX 2080, CUDA 10.1) enabled.
Code to reproduce the issue
import tensorflow as tf
import tensorflow_addons as tfa
import numpy as np
class InnerModel(tf.keras.Model):
def __init__(self):
super().__init__()
self.model1 = tf.keras.Sequential([
tfa.layers.WeightNormalization(
tf.keras.layers.Conv2D(3, 1),
),
])
self.model2 = tf.keras.Sequential([
tfa.layers.WeightNormalization(
tf.keras.layers.Conv2D(3, 1),
),
])
def call(self, inputs):
out1 = self.model1(inputs)
out2 = self.model2(out1)
return out1, out2
class OuterModel(tf.keras.Model):
def __init__(self):
super().__init__()
self.model1 = InnerModel()
self.model2 = InnerModel()
self.downsample = tf.keras.layers.AveragePooling2D(2, 1, 'same')
def call(self, inputs):
init_out = self.model1(inputs)
inputs = self.downsample(inputs)
final_out = self.model2(inputs)
return init_out, final_out
def loss_fn(out):
loss = 0.
for i in range(len(out)):
for j in range(len(out[i])):
loss += tf.reduce_mean(out[i][j])
return loss
net = OuterModel()
opt = tf.keras.optimizers.Adam(1e-4)
@tf.function
def train_step(inputs):
with tf.GradientTape() as tape:
out = net(inputs)
loss = loss_fn(out)
grads = tape.gradient(loss, net.trainable_variables)
opt.apply_gradients(zip(grads, net.trainable_variables))
def train():
for i in range(20):
x = np.random.randn(1, 32, 32, 3)
train_step(x)
print(f'step {i}')
train()
Other info / logs
2020-01-27 15:35:40.604091: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2020-01-27 15:35:40.624043: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x7fc57e1f3f50 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-01-27 15:35:40.624082: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version
WARNING:tensorflow:Layer outer_model is casting an input tensor from dtype float64 to the layer's dtype of float32, which is new behavior in TensorFlow 2. The layer has dtype float32 because it's dtype defaults to floatx.
If you intended to run this layer in float32, you can safely ignore this warning. If in doubt, this warning is likely only an issue if you are porting a TensorFlow 1.X model to TensorFlow 2.
To change all layers to have dtype float64 by default, call `tf.keras.backend.set_floatx('float64')`. To change just this layer, pass dtype='float64' to the layer constructor. If you are the author of this layer, you can disable autocasting by passing autocast=False to the base Layer constructor.
Issue Analytics
- State:
- Created 4 years ago
- Comments:11 (10 by maintainers)
Top Results From Across the Web
tfa.layers.WeightNormalization | TensorFlow Addons
This method can be used inside a subclassed layer or model's call function, in which case losses should be a Tensor or list...
Read more >Image Fusion based on Cross Bilateral and Rolling Guidance ...
In this paper, we proposed a new method to fuse the images by applying a cross bilateral filter for gray level similarities and...
Read more >Model Sub-Classing and Custom Training Loop from Scratch ...
In this article, we will try to understand the Model Sub-Classing API and Custom Training Loop from Scratch in TensorFlow 2. It may...
Read more >tfaddons: Interface to 'TensorFlow SIG Addons'
Kingma (2016) WeightNormalization wrapper works for keras and tf lay- ers. Value. A tensor. Examples. ## Not run: model= keras_model_sequential ...
Read more >Opportunities for neuromorphic computing algorithms and ...
Neurons and synapses only perform work when there are spikes to process ... artificial neural networks for different categories of problems.
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
https://github.com/tensorflow/addons/blob/6052477edcefad9691357e84312b127d5b57f4c0/tensorflow_addons/layers/wrappers.py#L62 seems to be the root cause. I suspect
tf.CriticalSection
doesn’t honor neithername
norshared_name
arguments and every time creates its own unique CS. Not only doesn’t it prevent the parallel execution https://github.com/tensorflow/addons/blob/6052477edcefad9691357e84312b127d5b57f4c0/tensorflow_addons/layers/wrappers.py#L134-L135 it creates a total mess with locks (and eventually a deadlock) somewhere deep inside. This tiny workaround https://github.com/failure-to-thrive/addons/commit/ec008b35e89d810bc7d7b41960664f98859336ea works like a charm.This should be fixed on nightly. Please let us know if it is not: https://github.com/tensorflow/addons/pull/1190