Stacking Convolutions and LSTM
See original GitHub issueI would like to stack 2D convolutions and LSTM layers, exactly the same problem as in #129
The proposed solution in #129 is a custom reshape layer By today, there is a built-in reshape layer in Keras. Searching the problem on Stackoverflow brings up a similar question, the accepted answer suggests using the built-in layer.
As a toy example, I would like to classify MNIST with a combination of Conv-Layers and an LSTM. I’ve sliced the images into four parts and arranged those parts into sequences. Then I’ve stacked the sequences. My training data is a numpy array with the shape [60000, 4, 1, 56, 14] where
- 60000 is the number of samples
- 4 is the number of timesteps
- 1 is number of colors, I’m using Theano layout for the image
- 56 and 14 are width and height
Please note: One of the image-slices has the size 14x14 since I’ve cut the 28x28 image in four parts. I get a 56 in the shape, because I’ve created 4 different sequences and stacked them along this axis.
Here is my code so far:
nb_filters=32
kernel_size=(3,3)
pool_size=(2,2)
nb_classes=10
batch_size=64
model=Sequential()
model.add(Convolution2D(nb_filters, kernel_size[0], kernel_size[1], border_mode="valid", input_shape=[1, 56,14]))
model.add(Activation("relu"))
model.add(Convolution2D(nb_filters, kernel_size[0], kernel_size[1]))
model.add(Activation("relu"))
model.add(MaxPooling2D(pool_size=pool_size))
model.add(Reshape((56*14,)))
model.add(Dropout(0.25))
model.add(LSTM(5))
model.add(Dense(50))
model.add(Dense(nb_classes))
model.add(Activation("softmax"))
When run, the Reshape layer raises a ValueError:
ValueError: total size of new array must be unchanged
I also tried to pass the number of timesteps to the Reshape layer: model.add(Reshape((4, 56*14)))
But that doesn’t solve the problem either.
What is the correct dimension to give to the Reshape layer ? Is a Reshape layer the correct solution at all ?
I’ve posted the same question on stackoverflow.
Issue Analytics
- State:
- Created 7 years ago
- Comments:18 (2 by maintainers)
If I understand you correctly, you want to feed the output of the Convolutional layer, with no time sequence information, to an LSTM layer. The way to do this is to divide it into time steps, either in batches or one element at a time. Since your feature map has a total dimensionality of
32*26*5=4160
, you could do this, for example, by treating every element of it as a time step in a sequence. To do this, you should reshape it withReshape((4160,1))
. To sequence N elements at a time, useReshape((4160/N, N))
, whereN
is an integer divisor of 4160.Amazing, this seems to do the trick.
I wrapped everything up to the LSTM in the TimeDistributed layer and provided the number of time_steps as an additional input dimension. This runs without problems.
Am I doing this the right way ?