question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Resource exhausted error

See original GitHub issue

I have a dataset where the number of samples is 25000 and number of features is 24995. I am trying to train an keras autoencoder model on this data and facing OOM error. Some specifics of the model are

Input matrix shape : (25000, 24995)

This input matrix is divided into validation set as training and testing data.

Train Matrix shape : (18750, 24995)
Test Matrix shape : (6250, 24995)

The code for training is

from keras.layers import Input, Dense
input_layer = Input(shape=(train_matrix.shape[1],))

encoding_hlayer1_dims = 12500
encoding_hlayer1 = Dense(encoding_hlayer1_dims, activation='relu', trainable=True, name="layer1")(input_layer)

decoding_hlayer1_dims = 12500
decoding_hlayer1 = Dense(train_matrix.shape[1], activation='relu')(encoding_hlayer1)

autoencoder = Model(input_layer, decoding_hlayer1)
autoencoder.compile(optimizer='adam', loss='binary_crossentropy')

The summary of the model is

Layer (type)                 Output Shape              Param #   
=================================================================
input_2 (InputLayer)         (None, 24995)             0         
_________________________________________________________________
layer1 (Dense)               (None, 12500)             312450000 
_________________________________________________________________
dense_1 (Dense)              (None, 24995)             312462495 
=================================================================
Total params: 624,912,495
Trainable params: 624,912,495
Non-trainable params: 0

Code to train the model

## Train
history = autoencoder.fit(train_matrix.toarray(), train_matrix.toarray(),
                epochs=50,
                batch_size=64,
                shuffle=True,
                validation_data=(test_matrix.toarray(), test_matrix.toarray()))

When I start training the mode, I get the following error:

ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[24995,12500]
     [[Node: mul_3 = Mul[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/gpu:0"](beta_1/read, Variable/read)]]

I am using a Nvidia Tesla k40c with 12 gigs. As per my knowledge, the model should fit in memory as 25000 * 12500 * 2 = 0.625 GB. Also, the input matrix dtype is numpy.float32.

@fchollet Can you please point out what exactly am I doing wrong here ?

Issue Analytics

  • State:closed
  • Created 6 years ago
  • Comments:5

github_iconTop GitHub Comments

2reactions
mahnerakcommented, Aug 11, 2017

Your model summary shows that you have 624,912,495 params (float32 is 4 bytes). It is almost 2.5GB. Then, your model consists not only parameters. GPU memory is used for your inputs. Finally, lot more memory can be used for runtime computations. It’s not easy to calculate memory usage of complex networks.

I recommend that you try smaller batch sizes (say 8, instead of your 64). Then try to increase size until it exceeds memory.

0reactions
zeeshanalipanhwarcommented, Apr 2, 2020

I was passing the steps_per_epoch to the model.fit and not batch_size which was not needed either in this case,

H = model.fit(X_train, Y_train, validation_data=(X_valid, Y_valid),
              steps_per_epoch=len(X_train)//BATCHSIZE, epochs=EPOCHS,
              validation_steps=len(X_valid)//BATCHSIZE)

Replacing that steps_per_epoch with batch_size somehow solved my issue.

H = model.fit(X_train, Y_train, batch_size=BATCHSIZE,
              epochs=EPOCHS, validation_data=(X_valid, Y_valid))
Read more comments on GitHub >

github_iconTop Results From Across the Web

OOM when allocating tensor with shape - Stack Overflow
ResourceExhaustedError : OOM when allocating tensor with shape[3840,155229] [[Node: decoder/previous_decoder/Add = Add[T=DT_FLOAT, _device="/job: ...
Read more >
How to solve Error of ResourceExhaustedError in Tensorflow
ResourceExhaustedError : OOM when allocating tensor with shape[8,192,23,23] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator ...
Read more >
tf.errors.ResourceExhaustedError | TensorFlow v2.11.0
For example, this error might be raised if a per-user quota is exhausted, or perhaps the entire file system is out of space....
Read more >
Resource exhausted: OOM when allocating tensor with shape ...
Error got during object detection training with ssd_mobilenet_v2_quantized_300x300_coco model. I am running below command to start the ...
Read more >
Resource exhausted: OOM when allo… - Apple Developer
ResourceExhaustedError : 2 root error(s) found. (0) Resource exhausted: OOM when allocating tensor with shape[256,384,3072] and type float on ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found