Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

ResourceExhaustedError

See original GitHub issue

I feed 30 pictures with 480*528 size to the PSPNet and my GPU is GTX1080ti. However it returned the error as follow:

2018-10-18 00:33:05.626813: I tensorflow/core/common_runtime/bfc_allocator.cc:674] 11 Chunks of size 152064000 totalling 1.56GiB
2018-10-18 00:33:05.626816: I tensorflow/core/common_runtime/bfc_allocator.cc:674] 5 Chunks of size 243302400 totalling 1.13GiB
2018-10-18 00:33:05.626819: I tensorflow/core/common_runtime/bfc_allocator.cc:674] 1 Chunks of size 266727424 totalling 254.37MiB
2018-10-18 00:33:05.626823: I tensorflow/core/common_runtime/bfc_allocator.cc:674] 1 Chunks of size 334904064 totalling 319.39MiB
2018-10-18 00:33:05.626826: I tensorflow/core/common_runtime/bfc_allocator.cc:674] 3 Chunks of size 364953600 totalling 1.02GiB
2018-10-18 00:33:05.626829: I tensorflow/core/common_runtime/bfc_allocator.cc:674] 3 Chunks of size 486604800 totalling 1.36GiB
2018-10-18 00:33:05.626832: I tensorflow/core/common_runtime/bfc_allocator.cc:678] Sum Total of in-use chunks: 9.83GiB
2018-10-18 00:33:05.626837: I tensorflow/core/common_runtime/bfc_allocator.cc:680] Stats: 
Limit:                 10586741146
InUse:                 10556298496
MaxInUse:              10556328192
NumAllocs:                   10164
MaxAllocSize:           3721396224
2018-10-18 00:33:05.626970: W tensorflow/core/common_runtime/bfc_allocator.cc:279] ***************************************************************************************************x
2018-10-18 00:33:05.626989: W tensorflow/core/framework/op_kernel.cc:1318] OP_REQUIRES failed at conv_ops.cc:693 : Resource exhausted: OOM when allocating tensor with shape[30,320,60,66] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
Traceback (most recent call last):
  File "/home/public/anaconda3/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 2963, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-5-4e46772605b9>", line 1, in <module>
    model.fit(X, Y, epochs=2)
  File "/home/public/anaconda3/lib/python3.6/site-packages/keras/engine/training.py", line 1039, in fit
    validation_steps=validation_steps)
  File "/home/public/anaconda3/lib/python3.6/site-packages/keras/engine/training_arrays.py", line 199, in fit_loop
    outs = f(ins_batch)
  File "/home/public/anaconda3/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py", line 2715, in __call__
    return self._call(inputs)
  File "/home/public/anaconda3/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py", line 2675, in _call
    fetched = self._callable_fn(*array_vals)
  File "/home/public/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1454, in __call__
    self._session._session, self._handle, args, status, None)
  File "/home/public/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/errors_impl.py", line 519, in __exit__
    c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[30,320,60,66] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
	 [[Node: block35_7_conv/convolution = Conv2D[T=DT_FLOAT, _class=["loc:@train...kpropInput"], data_format="NCHW", dilations=[1, 1, 1, 1], padding="SAME", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](block35_7_mixed/concat, block35_7_conv/kernel/read)]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
	 [[Node: loss_1/mul/_8545 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_22137_loss_1/mul", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

How can I solve this problem? Thank you!

Issue Analytics

State:
Created 5 years ago
Comments:7 (4 by maintainers)

Top GitHub Comments

2reactions

qubvelcommented, Oct 18, 2018

For FPN shapes should be divisible by 32

0reactions

MaxKinnycommented, Oct 18, 2018

I wonder if I select another model, it will convergent or not. So the model changes from PSPNet to FPN, but I receive this problem:

Traceback (most recent call last):
  File "/home/public/anaconda3/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 2963, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-4-9709eb679a88>", line 1, in <module>
    model = FPN(backbone_name='inceptionresnetv2', input_shape=(480, 528, 3), freeze_encoder=True, classes=3)
  File "/home/public/anaconda3/lib/python3.6/site-packages/segmentation_models/fpn/model.py", line 84, in FPN
    activation=activation)
  File "/home/public/anaconda3/lib/python3.6/site-packages/segmentation_models/fpn/builder.py", line 65, in build_fpn
    stage=i)(c, m)
  File "/home/public/anaconda3/lib/python3.6/site-packages/segmentation_models/fpn/blocks.py", line 35, in layer
    x = Add()([x, up])
  File "/home/public/anaconda3/lib/python3.6/site-packages/keras/engine/base_layer.py", line 431, in __call__
    self.build(unpack_singleton(input_shapes))
  File "/home/public/anaconda3/lib/python3.6/site-packages/keras/layers/merge.py", line 91, in build
    shape)
  File "/home/public/anaconda3/lib/python3.6/site-packages/keras/layers/merge.py", line 61, in _compute_elemwise_op_output_shape
    str(shape1) + ' ' + str(shape2))
ValueError: Operands could not be broadcast together with shapes (30, 33, 256) (30, 34, 256)

My code:

import numpy as np
import os
import keras
import cv2
from segmentation_models import FPN
from segmentation_models.utils import set_trainable
from segmentation_models.backbones import get_preprocessing


# prepare data
files_y = os.listdir("/home/public/Desktop/Inception_resnet_v2/Data/training_y")
files_x = os.listdir("/home/public/Desktop/Inception_resnet_v2/Data/training_x")
files_y.sort()
files_x.sort()
num_x = len(files_x)
num_y = len(files_y)
X = np.empty((num_x, 480, 528, 3))
Y = np.empty((num_y, 480, 528, 3))
for ind, file in enumerate(files_x):
    img = cv2.imread("/home/public/Desktop/Inception_resnet_v2/Data/training_x" + '/' + file)
    X[ind, :, :, :] = img
for ind, file in enumerate(files_y):
    img = cv2.imread("/home/public/Desktop/Inception_resnet_v2/Data/training_y" + '/' + file)
    Y[ind, :, :, :] = img

# pre-process
preprocessing_fn = get_preprocessing('inceptionresnetv2')
X = preprocessing_fn(X)
Y = preprocessing_fn(Y)


# prepare model
model = FPN(backbone_name='inceptionresnetv2', input_shape=(480, 528, 3), freeze_encoder=True, classes=3)
nadam = keras.optimizers.Nadam(lr=0.001, beta_1=0.9, beta_2=0.999, epsilon=1e-08, schedule_decay=0.004)
model.compile(optimizer=nadam, loss='binary_crossentropy', metrics=['binary_accuracy'])

# pretrain model decoder
model.fit(X, Y, epochs=10000)

Top Results From Across the Web

OOM when allocating tensor with shape - Stack Overflow

ResourceExhaustedError : OOM when allocating tensor with shape[3840,155229] [[Node: decoder/previous_decoder/Add = Add[T=DT_FLOAT, ...

How to solve Error of ResourceExhaustedError in Tensorflow

ResourceExhaustedError : OOM when allocating tensor with shape[8,192,23,23] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator ...

tf.errors.ResourceExhaustedError | TensorFlow v2.11.0

Some resource has been exhausted.

Resource exhausted: OOM when allocating tensor with shape ...

ResourceExhaustedError : 2 root error(s) found. (0) Resource exhausted: OOM when allocating tensor with shape[32,960,10,10] and type float on ...

ResourceExhaustedError: OOM when allocating tensor with ...

2021-11-25 13:18:23.325521: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] Stats: Limit: 91222016 InUse: 61331456 MaxInUse: ...