Resource exhausted error
See original GitHub issueI have a dataset where the number of samples is 25000 and number of features is 24995. I am trying to train an keras autoencoder model on this data and facing OOM error. Some specifics of the model are
Input matrix shape : (25000, 24995)
This input matrix is divided into validation set as training and testing data.
Train Matrix shape : (18750, 24995)
Test Matrix shape : (6250, 24995)
The code for training is
from keras.layers import Input, Dense
input_layer = Input(shape=(train_matrix.shape[1],))
encoding_hlayer1_dims = 12500
encoding_hlayer1 = Dense(encoding_hlayer1_dims, activation='relu', trainable=True, name="layer1")(input_layer)
decoding_hlayer1_dims = 12500
decoding_hlayer1 = Dense(train_matrix.shape[1], activation='relu')(encoding_hlayer1)
autoencoder = Model(input_layer, decoding_hlayer1)
autoencoder.compile(optimizer='adam', loss='binary_crossentropy')
The summary of the model is
Layer (type) Output Shape Param #
=================================================================
input_2 (InputLayer) (None, 24995) 0
_________________________________________________________________
layer1 (Dense) (None, 12500) 312450000
_________________________________________________________________
dense_1 (Dense) (None, 24995) 312462495
=================================================================
Total params: 624,912,495
Trainable params: 624,912,495
Non-trainable params: 0
Code to train the model
## Train
history = autoencoder.fit(train_matrix.toarray(), train_matrix.toarray(),
epochs=50,
batch_size=64,
shuffle=True,
validation_data=(test_matrix.toarray(), test_matrix.toarray()))
When I start training the mode, I get the following error:
ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[24995,12500]
[[Node: mul_3 = Mul[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/gpu:0"](beta_1/read, Variable/read)]]
I am using a Nvidia Tesla k40c with 12 gigs. As per my knowledge, the model should fit in memory as 25000 * 12500 * 2 = 0.625 GB. Also, the input matrix dtype is numpy.float32.
@fchollet Can you please point out what exactly am I doing wrong here ?
Issue Analytics
- State:
- Created 6 years ago
- Comments:5
Top Results From Across the Web
OOM when allocating tensor with shape - Stack Overflow
ResourceExhaustedError : OOM when allocating tensor with shape[3840,155229] [[Node: decoder/previous_decoder/Add = Add[T=DT_FLOAT, _device="/job: ...
Read more >How to solve Error of ResourceExhaustedError in Tensorflow
ResourceExhaustedError : OOM when allocating tensor with shape[8,192,23,23] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator ...
Read more >tf.errors.ResourceExhaustedError | TensorFlow v2.11.0
For example, this error might be raised if a per-user quota is exhausted, or perhaps the entire file system is out of space....
Read more >Resource exhausted: OOM when allocating tensor with shape ...
Error got during object detection training with ssd_mobilenet_v2_quantized_300x300_coco model. I am running below command to start the ...
Read more >Resource exhausted: OOM when allo… - Apple Developer
ResourceExhaustedError : 2 root error(s) found. (0) Resource exhausted: OOM when allocating tensor with shape[256,384,3072] and type float on ...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Your model summary shows that you have 624,912,495 params (float32 is 4 bytes). It is almost 2.5GB. Then, your model consists not only parameters. GPU memory is used for your inputs. Finally, lot more memory can be used for runtime computations. It’s not easy to calculate memory usage of complex networks.
I recommend that you try smaller batch sizes (say 8, instead of your 64). Then try to increase size until it exceeds memory.
I was passing the steps_per_epoch to the model.fit and not batch_size which was not needed either in this case,
Replacing that steps_per_epoch with batch_size somehow solved my issue.