When Dropout layer is shared Siamese-style, dropped units are not synchronized.
See original GitHub issueWhen sharing Dropout
layers Siamese-style, I wasn’t able to synchronize dropped units. For example, in the code below, noise_shape
has no effect. seed
parameter has no effect either. Shared Dropout
layers should synchronize dropped units by default, otherwise they are not shared in any meaningful way.
# https://gist.github.com/ozabluda/bbc6c84c0e69bfd9ca55170fd3ab040d
# https://github.com/keras-team/keras/issues/8802
from keras.models import Sequential, Model
from keras.layers import Dense, Dropout, Input, Lambda
from keras import backend as K
import numpy as np
m = Sequential([
Dropout(rate=0.5, input_shape=(1,), noise_shape=(1,1))
])
m.summary()
input_a = Input(shape=(1,))
input_b = Input(shape=(1,))
processed_a = m(input_a)
processed_b = m(input_b)
def l1_distance((x1, x2)):
return K.sum(K.abs(x1-x2), axis=1)
c = Lambda(l1_distance, output_shape=(1,))([processed_a, processed_b])
s = Model([input_a, input_b], c)
s.compile(optimizer='sgd', loss='mse')
s.summary()
x0 = np.array([1])
x1 = np.array([1])
x = [x0,x1]
y = np.array([0])
s.fit(x, y, verbose=1, epochs=10)
print(s.evaluate(x,y), s.predict(x))
Output:
Epoch 1/10
1/1 [==============================] - 1s 1s/step - loss: 0.0000e+00
Epoch 2/10
1/1 [==============================] - 0s 3ms/step - loss: 4.0000
Epoch 3/10
1/1 [==============================] - 0s 3ms/step - loss: 4.0000
Epoch 4/10
1/1 [==============================] - 0s 4ms/step - loss: 0.0000e+00
Epoch 5/10
1/1 [==============================] - 0s 3ms/step - loss: 0.0000e+00
Epoch 6/10
1/1 [==============================] - 0s 3ms/step - loss: 0.0000e+00
Epoch 7/10
1/1 [==============================] - 0s 3ms/step - loss: 4.0000
Epoch 8/10
1/1 [==============================] - 0s 3ms/step - loss: 4.0000
Epoch 9/10
1/1 [==============================] - 0s 3ms/step - loss: 4.0000
Epoch 10/10
1/1 [==============================] - 0s 4ms/step - loss: 0.0000e+00
1/1 [==============================] - 0s 14ms/step
(0.0, array([ 0.], dtype=float32))
Note that
- during inference nothing is dropped, so loss is 0, as expected
- if dropout rate=0, the loss is 0 during training, as expected
- Note that on Epochs 2,3,7,8,9 the loss is 4, which I don’t understand at all. Maybe another bug, I need to investigate further.
Issue Analytics
- State:
- Created 6 years ago
- Comments:6 (4 by maintainers)
Top Results From Across the Web
Dropout layer - Keras
The Dropout layer randomly sets input units to 0 with a frequency of rate at each step during training time, which helps prevent...
Read more >What if all the nodes are dropped when using dropout?
The units don't "disappear" when dropped, they just take on the value zero, which from the perspective of the other layers in the...
Read more >Neural networks [7.5] : Deep learning - dropout - YouTube
Your browser can't play this video. Learn more. Switch camera.
Read more >A Gentle Introduction to Dropout for Regularizing Deep Neural ...
It is not used on the output layer. The term “dropout” refers to dropping out units (hidden and visible) in a neural network....
Read more >Dropout in Neural Networks - Towards Data Science
In the original implementation of the dropout layer, during training, a unit (node/neuron) in a layer is selected with a keep probability (1-drop...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
OK so I implemented a shared Dropout layer by myself, similar to what @fchollet suggested.
Example to call the layer:
It took me a few hours to implement/test this. So I am proud of this for a moment 😄
I think the issue @ozabluda posted is very good/important. If we have a Siamese network, there is a possibility that it is risky to apply certain techniques like Dropout and Batch Normalization. I personally tried this in a Siamese network and I noticed the performance was very bad!
Do you have any idea how to improve this yet @ozabluda? Thanks!