Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Help with LSTM

See original GitHub issue

I am new to Keras and I am trying to create a few toy examples so that I get to know Keras better. I was trying to implement a LSTM that takes as inputs two binary strings (b1 and b2, and returns the result of applying b1 OR b2. I know this is not what you would generally use an LSTM for, but I’d still like to try this to get familiar with the different LSTM architectures

I’ve created a sequence-to-sequence model. Calling the binary strings a and b, and working with strings of 5 bits, we have the following architecture:

    [y_0]      [y_1]      [y_2]      [y_i]      [y_n]
      |          |          |          |          |
      |          |          |          |          |
    [h_0]----->[h_1]----->[h_2]----->[h_i]----->[h_n]
      |          |          |          |          |
      |          |          |          |          |
  [a_0,b_0]  [a_1,b_1]  [a_2,b_2]  [a_i,b_i]  [a_n,b_n]

Where [a_i, b_i] corresponds with the ith bit of both strings a and b. This way:

X_train = (100000, 5, 2) # [samples, time steps, features]
y_train = (100000, 5, 1)
X_test = (30000, 5, 2)
y_test = (30000, 5, 1)

I am creating the LSTM and fitting it with:

model = Sequential()
model.add(LSTM(5, input_dim=2, input_length=5, return_sequences=True))
model.add(TimeDistributed(Dense(1)))
model.compile(loss='mse', optimizer='rmsprop', metrics=['accuracy'])
model.fit(X_train, y_train, batch_size=100, nb_epoch=10,
          validation_data=(X_test, y_test))

However, this doesn’t converge, this the output:

Loading data...
100000 train sequences
30000 test sequences
Data Shapes:
 X_train: (100000, 5, 2)
 y_train: (100000, 5, 1)
 X_test: (30000, 5, 2)
 y_test: (30000, 5, 1)
Build model...
Train...
Train on 100000 samples, validate on 30000 samples
Epoch 1/10
100000/100000 [==============================] - 8s - loss: 0.3251 - acc: 0.5748 - val_loss: 0.2094 - val_acc: 0.6819
Epoch 2/10
100000/100000 [==============================] - 9s - loss: 0.1963 - acc: 0.6852 - val_loss: 0.1860 - val_acc: 0.7068
Epoch 3/10
100000/100000 [==============================] - 9s - loss: 0.1772 - acc: 0.7252 - val_loss: 0.1706 - val_acc: 0.7396
Epoch 4/10
100000/100000 [==============================] - 8s - loss: 0.1686 - acc: 0.7394 - val_loss: 0.1665 - val_acc: 0.7453
Epoch 5/10
100000/100000 [==============================] - 8s - loss: 0.1654 - acc: 0.7425 - val_loss: 0.1639 - val_acc: 0.7455
Epoch 6/10
100000/100000 [==============================] - 9s - loss: 0.1634 - acc: 0.7428 - val_loss: 0.1620 - val_acc: 0.7457
Epoch 7/10
100000/100000 [==============================] - 8s - loss: 0.1615 - acc: 0.7471 - val_loss: 0.1601 - val_acc: 0.7511
Epoch 8/10
100000/100000 [==============================] - 9s - loss: 0.1594 - acc: 0.7545 - val_loss: 0.1581 - val_acc: 0.7557
Epoch 9/10
100000/100000 [==============================] - 8s - loss: 0.1576 - acc: 0.7553 - val_loss: 0.1566 - val_acc: 0.7573
Epoch 10/10
100000/100000 [==============================] - 8s - loss: 0.1564 - acc: 0.7554 - val_loss: 0.1559 - val_acc: 0.7585
29900/30000 [============================>.] - ETA: 0sTest score:  0.155859013249
Test accuracy:  0.758480000496

Any clues on what I am doing wrong? I’ve tried tweaking the hyperparams without much success. Should I be working with another LSTM architecture?

Issue Analytics

State:
Created 7 years ago
Comments:6

Top GitHub Comments

3reactions

linfaycommented, Nov 8, 2016

Binary operations such as OR, AND, XOR etc are not good examples for RNNs, since there is no sequence / time dependency. Take a look at this page

http://www.xcprod.com/titan/XCSB-DOC/binary_or.html

You can see that each result bit is dependent on only the two bits to be OR’d, and not any surrounding bits. An RNN should be able to learn it, but it makes for a better example for static NNs, that’s why XOR is a ‘hello world’ example.

Binary addition is a better exercise, since it involves a carry bit which is dependent on the previous operation in the sequence. This is in numpy, not keras, but see this page and scroll down to ‘Our Toy Code’

https://iamtrask.github.io/2015/11/15/anyone-can-code-lstm/

On a separate note, are you generating the dataset at random? Because when n=5, 2^5 = 32 so there are only 32 5 bit sequences. So the number of possible OR operations with two 5 bit numbers is 32^2 = 1024, so that’s the max size of your dataset. So for low n, you could generate the entire set and then split by train / test / validation. By the time you get to n = 9 or so it will probably be worth going back to random.

0reactions

0x7CAcommented, Nov 7, 2016

I see, the amount of features is very low. Do you know if any other model has achieved a higher performance with those descriptors? Looks like you’ll have to change something in that aspect

Top Results From Across the Web

Long Short-Term Memory Networks - MATLAB & Simulink

LSTM networks support input data with varying sequence lengths. When passing data through the network, the software pads, truncates, or splits sequences so ......

10 Hyperparameters to keep an eye on for your LSTM model

LSTMs enable backpropagation of the error through time and layers hence helping preserve them. An LSTM (Long short-term memory) model is an ...

A Beginner's Guide to LSTMs and Recurrent Neural Networks

LSTMs are a powerful kind of RNN used for processing sequential data such as sound, time series (sensor) data or written natural language....

A Gentle Introduction to Long Short-Term Memory Networks ...

Need help with LSTMs for Sequence Prediction? Take my free 7-day email course and discover 6 different LSTM architectures (with code).

LSTM for time series prediction. Training a Long Short Term ...

To help the LSTM model to converge faster it is important to scale the data. It is possible that large values in the...