question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Reducing overfitting with embedding weights??

See original GitHub issue
  • Check that you are up-to-date with the master branch of Keras. You can update with: pip install git+git://github.com/fchollet/keras.git --upgrade --no-deps
  • If running on Theano, check that you are up-to-date with the master branch of Theano. You can update with: pip install git+git://github.com/Theano/Theano.git --upgrade --no-deps

Hi, I’m working on text prediction task using my pretrained word embeddings. The current model is severely overfitting with increasing val loss. Need your advice on how to mitigate this and the train network properly. I’m using embedding weights in input layer. The embeddings were trained using a larger corpus. The training corpus of model is a subset of it. For out of vocab words I’m using glorot_uniform(270,) to get random embeddings.

Q1- Is it because my network parameters are far greater than train size ?

Q2- Am I using embeddings correctly? Do I have problem with embedding weights?

Q3- What else can I try?

Things I’ve tried:

  • Passing train data as val data for bug check. Val loss decreases.
  • Adding regularization to all layers.
  • Reducing lstm cells from 256-128
  • Varying input sequence length 5-32

My model

  • Trained a word2vec model to get word embeddings.
  • X_train: sequence of words (indices map using wordvec and oov dictionary)
  • Using X_train to get embedding weights from word2vec model. i.e embed[i]=word2vec[X_train[i]]
  • y_train- one hot target vector with 1 at pos of next word wrt to corpus indices i.e not using word2vec+oov dict.
Corpus size: 46265, Corpus vocab size: 22120
Out of vocab words:  15248
Vocab size of Word2vec model + oov words: 34444
Train corpus size:  46169
Test corpus size:  96
X_train.shape:  (46137, 32)
y_train.shape:  (46137, 22120)
embed_weight.shape:  (34444, 270)

w2v_dim= 270
seq_length= 32
corpus_vocab_size= # of unique words in corpus
memory_units=128

model.add(Embedding(embed_weight.shape[0], embed_weight.shape[1], mask_zero=False, weights=[embed_weight], input_length=seq_length, W_regularizer=l2(l2_emb)))
model.add(LSTM(memory_units, return_sequences=False, init= "orthogonal", W_regularizer=l2(l2_lstm)))
model.add(Dropout(0.5))
model.add((Dense(corpus_vocab_size, activation='softmax', init= "orthogonal", W_regularizer=l2(l2_dense)))

Compiling Model
l2_emb:  0.7  l2_lstm:  0.7  l2_dense:  0.7  Dropout:  0.5

Fitting model
Train on 36909 samples, validate on 9228 samples
('lr:', array(0.0010000000474974513, dtype=float32))
Epoch 1/40
36909/36909 [==============================] - 135s - loss: 300351.6275 - acc: 0.0223 - val_loss: 9.8617 - val_acc: 0.0237
('lr:', array(0.0010000000474974513, dtype=float32))
Epoch 2/40
36909/36909 [==============================] - 134s - loss: 35422.2196 - acc: 0.0231 - val_loss: 9.9594 - val_acc: 0.0237
('lr:', array(0.0010000000474974513, dtype=float32))
Epoch 3/40
36909/36909 [==============================] - 135s - loss: 3996.1297 - acc: 0.0231 - val_loss: 10.1249 - val_acc: 0.0237
('lr:', array(0.0010000000474974513, dtype=float32))
Epoch 4/40
36909/36909 [==============================] - 134s - loss: 254.4328 - acc: 0.0229 - val_loss: 10.3708 - val_acc: 0.0237
('lr:', array(0.0010000000474974513, dtype=float32))
.
.
.
Epoch 38/40
36909/36909 [==============================] - 134s - loss: 9.0427 - acc: 0.0231 - val_loss: 11.7586 - val_acc: 0.0237
('lr:', array(0.0005000000237487257, dtype=float32))
Epoch 39/40
36909/36909 [==============================] - 134s - loss: 9.0425 - acc: 0.0231 - val_loss: 11.7566 - val_acc: 0.0237
('lr:', array(0.0005000000237487257, dtype=float32))
Epoch 40/40
36909/36909 [==============================] - 134s - loss: 9.0423 - acc: 0.0231 - val_loss: 11.7524 - val_acc: 0.0237

@braingineer @farizrahman4u @carlthome your two cents??

Need advice. Thanks !

Edit: optimizer=adam, loss=‘categorical_crossentropy’

Issue Analytics

  • State:closed
  • Created 7 years ago
  • Comments:11 (6 by maintainers)

github_iconTop GitHub Comments

7reactions
MaratZakirovcommented, Jan 20, 2017

I must say that in my opinion using ob Embedding layer always leads to over-fitting simply because Embedding DRAMATICALLY increases number of free parameters to learn. Just suppose you have 500K vocabulary for each word 100 floats. In the other case when you use word2vec pretrained and FIXED representation number of free parameters is just equal to number of free parameters of your NN which often quite small.

0reactions
jerheffcommented, Oct 27, 2016

@carlthome I could be missing something, but isn’t the Embedding layer as specified in the comment learned in the model? It seems to me that attaching this directly to randomly initiated layers would be a bad idea until those layers settle down.

On the other hand, if the word2vec transform is happening outside of the model (and not learnable) then it is not something to discuss.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Handling overfitting in deep learning models
Another way to reduce overfitting is to lower the capacity of the model to memorize the training data. As such, the model will...
Read more >
Guide to Prevent Overfitting in Neural Networks
This technique of reducing overfitting aims to stabilize an overfitted network by adding a weight penalty term, which penalizes the large value ...
Read more >
Use Weight Regularization to Reduce Overfitting of Deep ...
Penalizing a network based on the size of the network weights during training can reduce overfitting. An L1 or L2 vector norm penalty...
Read more >
Day 3: Overfitting, regularization, dropout, pretrained models ...
Regularization helps to keep weights small · The behavior of the network with small weights won't change too much if we change a...
Read more >
Prevent over-fitting of text classification using Word ...
Besides simply reducing the parameters such as the embedding size and the amount of ... LSTMs seem to overfit quite easily (so I...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found