Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

TimeDistributed layer does not correctly pass on mask

See original GitHub issue

A well-known usecase for the TimeDistributed layer is to make a hierarchical LSTM. For instance, one can first run an LSTM over words in a sentence and then an LSTM over the sentences. Similarly, this can be done with LSTM over characters in words and then over words etc.

This Keras example shows this functionality nicely. However, what is not obvious is how the masking is not correctly passed on when using the TimeDistributed layer. This is critical as a sentence will often not have the same number of words (or all words the same number of characters).

To illustrate this issue I’ve modified the MNIST Hierarchical RNN example by removing the right half of all the images and adding a masking layer (see below). Now add if mask == None: raise ValueError() to https://github.com/fchollet/keras/blob/master/keras/layers/recurrent.py#L198 and you’ll see that the mask is not passed on.

This is done without any warnings whatsoever, making the user unaware of this behavior. How can we modify the TimeDistributed wrapper to correctly pass on the mask on the lower level?

from __future__ import print_function

from keras.datasets import mnist
from keras.models import Sequential, Model
from keras.layers import Input, Dense, TimeDistributed, Masking
from keras.layers import LSTM
from keras.utils import np_utils

# Training parameters.
batch_size = 32
nb_classes = 10
nb_epochs = 5

# Embedding dimensions.
row_hidden = 128
col_hidden = 128

# The data, shuffled and split between train and test sets.
(X_train, y_train), (X_test, y_test) = mnist.load_data()

# Reshapes data to 4D for Hierarchical RNN.
X_train = X_train.reshape(X_train.shape[0], 28, 28, 1)
X_test = X_test.reshape(X_test.shape[0], 28, 28, 1)
X_train = X_train.astype('float32')
X_test = X_test.astype('float32')
X_train /= 255
X_test /= 255
print('X_train shape:', X_train.shape)
print(X_train.shape[0], 'train samples')
print(X_test.shape[0], 'test samples')

# ADDED: Remove pixel values for right half of the image
# This is similar to the use case of running an LSTM over
# multiple sentences, where each sentence has some masking.
X_train[:,:,14:] = 0
X_test[:,:,14:] = 0

# Converts class vectors to binary class matrices.
Y_train = np_utils.to_categorical(y_train, nb_classes)
Y_test = np_utils.to_categorical(y_test, nb_classes)

row, col, pixel = X_train.shape[1:]

# 4D input.
x = Input(shape=(row, col, pixel))
#x = Input(batch_shape=(batch_size, row, col, pixel))

# ADDED: Masking layer to take into account that right
# half of image is removed.
x_masked = TimeDistributed(Masking())(x)

# Encodes a row of pixels using TimeDistributed Wrapper.
encoded_rows = TimeDistributed(LSTM(output_dim=row_hidden))(x_masked)

# Encodes columns of encoded rows.
encoded_columns = LSTM(col_hidden)(encoded_rows)

# Final predictions and model.
prediction = Dense(nb_classes, activation='softmax')(encoded_columns)
model = Model(input=x, output=prediction)
model.compile(loss='categorical_crossentropy',
              optimizer='rmsprop',
              metrics=['accuracy'])

# Training.
model.fit(X_train, Y_train, batch_size=batch_size, nb_epoch=nb_epochs,
          verbose=1, validation_data=(X_test, Y_test))

# Evaluation.
scores = model.evaluate(X_test, Y_test, verbose=0)
print('Test loss:', scores[0])
print('Test accuracy:', scores[1])

Issue Analytics

State:
Created 7 years ago
Reactions:15
Comments:9

Top GitHub Comments

2reactions

axeltidemanncommented, Jul 8, 2020

I solved this by simply creating a new layer that inherits from TimeDistributed and passes on the mask. (Note: due to legacy reasons, I had to stay on Keras 2.2.4. Don’t know if this has been fixed since, I hope so.) This allows you to have Masking layers before this one.

class MaskedTimeDistributed(TimeDistributed):                                                                                                                                                                                                                                   
    def __init__(self, layer, **kwargs):                                                                                                                                                                                                                                        
        self.supports_masking = True                                                                                                                                                                                                                                            
        super(TimeDistributed, self).__init__(layer, **kwargs)                                                                                                                                                                                                                  
                                                                                                                                                                                                                                                                                
    def compute_mask(self, inputs, mask=None):                                                                                                                                                                                                                                  
        return mask

1reaction

erickrfcommented, Jun 29, 2017

@amirveyseh the problem happens when you use an Embedding layer

Top Results From Across the Web

Keras TimeDistributed Not Masking CNN Model - Stack Overflow

I obtain the following error: TypeError: Layer input_1 does not support masking, but was passed an input_mask: ...

How to Use the TimeDistributed Layer in Keras

One reason for this difficulty in Keras is the use of the TimeDistributed wrapper layer and the need for some LSTM layers to...

tf.keras.layers.TimeDistributed | TensorFlow

If the output mask at each time step is None and the input mask is not None :(E.g., inner layer is Dense) Reduce...

keras.layers.wrappers — conx 3.7.9 documentation

`TimeDistributed` can be used with arbitrary layers, not just `Dense`, ... If the output mask at each time step is `None` and the...

Release Notes - Deeplearning4j - Konduit

Fixed an issue with GlobalPooling layer with masks of different datatype to the ... Fixed an issue where MultiLayerNetwork evaluation was not passing...