Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

When Dropout layer is shared Siamese-style, dropped units are not synchronized.

See original GitHub issue

When sharing Dropout layers Siamese-style, I wasn’t able to synchronize dropped units. For example, in the code below, noise_shape has no effect. seed parameter has no effect either. Shared Dropout layers should synchronize dropped units by default, otherwise they are not shared in any meaningful way.

# https://gist.github.com/ozabluda/bbc6c84c0e69bfd9ca55170fd3ab040d
# https://github.com/keras-team/keras/issues/8802

from keras.models import Sequential, Model
from keras.layers import Dense, Dropout, Input, Lambda
from keras import backend as K
import numpy as np

m = Sequential([
    Dropout(rate=0.5, input_shape=(1,), noise_shape=(1,1))
])
m.summary()

input_a = Input(shape=(1,))
input_b = Input(shape=(1,))

processed_a = m(input_a)
processed_b = m(input_b)

def l1_distance((x1, x2)):
    return K.sum(K.abs(x1-x2), axis=1)

c = Lambda(l1_distance, output_shape=(1,))([processed_a, processed_b])
s = Model([input_a, input_b], c)
s.compile(optimizer='sgd', loss='mse')
s.summary()

x0 = np.array([1])
x1 = np.array([1])
x  = [x0,x1]
y  = np.array([0])

s.fit(x, y, verbose=1, epochs=10)

print(s.evaluate(x,y), s.predict(x))

Output:

Epoch 1/10
1/1 [==============================] - 1s 1s/step - loss: 0.0000e+00
Epoch 2/10
1/1 [==============================] - 0s 3ms/step - loss: 4.0000
Epoch 3/10
1/1 [==============================] - 0s 3ms/step - loss: 4.0000
Epoch 4/10
1/1 [==============================] - 0s 4ms/step - loss: 0.0000e+00
Epoch 5/10
1/1 [==============================] - 0s 3ms/step - loss: 0.0000e+00
Epoch 6/10
1/1 [==============================] - 0s 3ms/step - loss: 0.0000e+00
Epoch 7/10
1/1 [==============================] - 0s 3ms/step - loss: 4.0000
Epoch 8/10
1/1 [==============================] - 0s 3ms/step - loss: 4.0000
Epoch 9/10
1/1 [==============================] - 0s 3ms/step - loss: 4.0000
Epoch 10/10
1/1 [==============================] - 0s 4ms/step - loss: 0.0000e+00
1/1 [==============================] - 0s 14ms/step
(0.0, array([ 0.], dtype=float32))

Note that

during inference nothing is dropped, so loss is 0, as expected
if dropout rate=0, the loss is 0 during training, as expected
Note that on Epochs 2,3,7,8,9 the loss is 4, which I don’t understand at all. Maybe another bug, I need to investigate further.

Issue Analytics

State:
Created 6 years ago
Comments:6 (4 by maintainers)

Top GitHub Comments

1reaction

hoangcuong2011commented, May 9, 2019

OK so I implemented a shared Dropout layer by myself, similar to what @fchollet suggested.

class SharedDropout(Layer):
	# learnt this from this link: https://github.com/tensorflow/tensorflow/blob/r1.13/tensorflow/python/ops/nn_ops.py
	def __init__(self, keep_prob_rate=0.5, **kwargs):
		self.keep_prob_rate = keep_prob_rate
		super(SharedDropout, self).__init__(**kwargs)

	def build(self, input_shape):
		super(SharedDropout, self).build(input_shape)

	def call(self, inputs):
		input_left = inputs[0]
		input_right = inputs[1]
		random_tensor = self.keep_prob_rate
		random_tensor += tf.random_uniform(tf.shape(input_left), dtype=input_left.dtype)
		binary_tensor = tf.floor(random_tensor)

		def DropoutLeft():
			ret_left = tf.divide(input_left, self.keep_prob_rate) * binary_tensor
			return ret_left

		def DropoutRight():
			ret_right = tf.divide(input_right, self.keep_prob_rate) * binary_tensor
			return ret_right

		return [K.in_train_phase(DropoutLeft, input_left, training=None), K.in_train_phase(DropoutRight, input_right, training=None)]

	def compute_output_shape(self, input_shapes):
		return [(input_shapes[0][0], input_shapes[0][1]), (input_shapes[1][0], input_shapes[1][1])]

Example to call the layer:

x_input_left = Input(shape=(10,), name='x_input_left')

x_input_right = Input(shape=(10,), name='x_input_right')

shared_dropout = SharedDropout()

x_left, x_right = shared_dropout([x_input_left, x_input_right])

It took me a few hours to implement/test this. So I am proud of this for a moment 😄

0reactions

hoangcuong2011commented, May 9, 2019

I think the issue @ozabluda posted is very good/important. If we have a Siamese network, there is a possibility that it is risky to apply certain techniques like Dropout and Batch Normalization. I personally tried this in a Siamese network and I noticed the performance was very bad!

Do you have any idea how to improve this yet @ozabluda? Thanks!