Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Resource exhausted error

See original GitHub issue

I have a dataset where the number of samples is 25000 and number of features is 24995. I am trying to train an keras autoencoder model on this data and facing OOM error. Some specifics of the model are

Input matrix shape : (25000, 24995)

This input matrix is divided into validation set as training and testing data.

Train Matrix shape : (18750, 24995)
Test Matrix shape : (6250, 24995)

The code for training is

from keras.layers import Input, Dense
input_layer = Input(shape=(train_matrix.shape[1],))

encoding_hlayer1_dims = 12500
encoding_hlayer1 = Dense(encoding_hlayer1_dims, activation='relu', trainable=True, name="layer1")(input_layer)

decoding_hlayer1_dims = 12500
decoding_hlayer1 = Dense(train_matrix.shape[1], activation='relu')(encoding_hlayer1)

autoencoder = Model(input_layer, decoding_hlayer1)
autoencoder.compile(optimizer='adam', loss='binary_crossentropy')

The summary of the model is

Layer (type)                 Output Shape              Param #   
=================================================================
input_2 (InputLayer)         (None, 24995)             0         
_________________________________________________________________
layer1 (Dense)               (None, 12500)             312450000 
_________________________________________________________________
dense_1 (Dense)              (None, 24995)             312462495 
=================================================================
Total params: 624,912,495
Trainable params: 624,912,495
Non-trainable params: 0

Code to train the model

## Train
history = autoencoder.fit(train_matrix.toarray(), train_matrix.toarray(),
                epochs=50,
                batch_size=64,
                shuffle=True,
                validation_data=(test_matrix.toarray(), test_matrix.toarray()))

When I start training the mode, I get the following error:

ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[24995,12500]
     [[Node: mul_3 = Mul[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/gpu:0"](beta_1/read, Variable/read)]]

I am using a Nvidia Tesla k40c with 12 gigs. As per my knowledge, the model should fit in memory as 25000 * 12500 * 2 = 0.625 GB. Also, the input matrix dtype is numpy.float32.

@fchollet Can you please point out what exactly am I doing wrong here ?

Issue Analytics

State:
Created 6 years ago
Comments:5

Top GitHub Comments

2reactions

mahnerakcommented, Aug 11, 2017

Your model summary shows that you have 624,912,495 params (float32 is 4 bytes). It is almost 2.5GB. Then, your model consists not only parameters. GPU memory is used for your inputs. Finally, lot more memory can be used for runtime computations. It’s not easy to calculate memory usage of complex networks.

I recommend that you try smaller batch sizes (say 8, instead of your 64). Then try to increase size until it exceeds memory.

0reactions

zeeshanalipanhwarcommented, Apr 2, 2020

I was passing the steps_per_epoch to the model.fit and not batch_size which was not needed either in this case,

H = model.fit(X_train, Y_train, validation_data=(X_valid, Y_valid),
              steps_per_epoch=len(X_train)//BATCHSIZE, epochs=EPOCHS,
              validation_steps=len(X_valid)//BATCHSIZE)

Replacing that steps_per_epoch with batch_size somehow solved my issue.

H = model.fit(X_train, Y_train, batch_size=BATCHSIZE,
              epochs=EPOCHS, validation_data=(X_valid, Y_valid))

Top Results From Across the Web

OOM when allocating tensor with shape - Stack Overflow

ResourceExhaustedError : OOM when allocating tensor with shape[3840,155229] [[Node: decoder/previous_decoder/Add = Add[T=DT_FLOAT, _device="/job: ...

How to solve Error of ResourceExhaustedError in Tensorflow

ResourceExhaustedError : OOM when allocating tensor with shape[8,192,23,23] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator ...

tf.errors.ResourceExhaustedError | TensorFlow v2.11.0

For example, this error might be raised if a per-user quota is exhausted, or perhaps the entire file system is out of space....

Resource exhausted: OOM when allocating tensor with shape ...

Error got during object detection training with ssd_mobilenet_v2_quantized_300x300_coco model. I am running below command to start the ...

Resource exhausted: OOM when allo… - Apple Developer

ResourceExhaustedError : 2 root error(s) found. (0) Resource exhausted: OOM when allocating tensor with shape[256,384,3072] and type float on ...