TimeDistributed layer does not correctly pass on mask
See original GitHub issueA well-known usecase for the TimeDistributed layer is to make a hierarchical LSTM. For instance, one can first run an LSTM over words in a sentence and then an LSTM over the sentences. Similarly, this can be done with LSTM over characters in words and then over words etc.
This Keras example shows this functionality nicely. However, what is not obvious is how the masking is not correctly passed on when using the TimeDistributed layer. This is critical as a sentence will often not have the same number of words (or all words the same number of characters).
To illustrate this issue I’ve modified the MNIST Hierarchical RNN example by removing the right half of all the images and adding a masking layer (see below). Now add if mask == None: raise ValueError()
to https://github.com/fchollet/keras/blob/master/keras/layers/recurrent.py#L198 and you’ll see that the mask is not passed on.
This is done without any warnings whatsoever, making the user unaware of this behavior. How can we modify the TimeDistributed wrapper to correctly pass on the mask on the lower level?
from __future__ import print_function
from keras.datasets import mnist
from keras.models import Sequential, Model
from keras.layers import Input, Dense, TimeDistributed, Masking
from keras.layers import LSTM
from keras.utils import np_utils
# Training parameters.
batch_size = 32
nb_classes = 10
nb_epochs = 5
# Embedding dimensions.
row_hidden = 128
col_hidden = 128
# The data, shuffled and split between train and test sets.
(X_train, y_train), (X_test, y_test) = mnist.load_data()
# Reshapes data to 4D for Hierarchical RNN.
X_train = X_train.reshape(X_train.shape[0], 28, 28, 1)
X_test = X_test.reshape(X_test.shape[0], 28, 28, 1)
X_train = X_train.astype('float32')
X_test = X_test.astype('float32')
X_train /= 255
X_test /= 255
print('X_train shape:', X_train.shape)
print(X_train.shape[0], 'train samples')
print(X_test.shape[0], 'test samples')
# ADDED: Remove pixel values for right half of the image
# This is similar to the use case of running an LSTM over
# multiple sentences, where each sentence has some masking.
X_train[:,:,14:] = 0
X_test[:,:,14:] = 0
# Converts class vectors to binary class matrices.
Y_train = np_utils.to_categorical(y_train, nb_classes)
Y_test = np_utils.to_categorical(y_test, nb_classes)
row, col, pixel = X_train.shape[1:]
# 4D input.
x = Input(shape=(row, col, pixel))
#x = Input(batch_shape=(batch_size, row, col, pixel))
# ADDED: Masking layer to take into account that right
# half of image is removed.
x_masked = TimeDistributed(Masking())(x)
# Encodes a row of pixels using TimeDistributed Wrapper.
encoded_rows = TimeDistributed(LSTM(output_dim=row_hidden))(x_masked)
# Encodes columns of encoded rows.
encoded_columns = LSTM(col_hidden)(encoded_rows)
# Final predictions and model.
prediction = Dense(nb_classes, activation='softmax')(encoded_columns)
model = Model(input=x, output=prediction)
model.compile(loss='categorical_crossentropy',
optimizer='rmsprop',
metrics=['accuracy'])
# Training.
model.fit(X_train, Y_train, batch_size=batch_size, nb_epoch=nb_epochs,
verbose=1, validation_data=(X_test, Y_test))
# Evaluation.
scores = model.evaluate(X_test, Y_test, verbose=0)
print('Test loss:', scores[0])
print('Test accuracy:', scores[1])
Issue Analytics
- State:
- Created 7 years ago
- Reactions:15
- Comments:9
I solved this by simply creating a new layer that inherits from TimeDistributed and passes on the mask. (Note: due to legacy reasons, I had to stay on Keras 2.2.4. Don’t know if this has been fixed since, I hope so.) This allows you to have Masking layers before this one.
@amirveyseh the problem happens when you use an Embedding layer