question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

ResourceExhaustedError

See original GitHub issue

I feed 30 pictures with 480*528 size to the PSPNet and my GPU is GTX1080ti. However it returned the error as follow:

2018-10-18 00:33:05.626813: I tensorflow/core/common_runtime/bfc_allocator.cc:674] 11 Chunks of size 152064000 totalling 1.56GiB
2018-10-18 00:33:05.626816: I tensorflow/core/common_runtime/bfc_allocator.cc:674] 5 Chunks of size 243302400 totalling 1.13GiB
2018-10-18 00:33:05.626819: I tensorflow/core/common_runtime/bfc_allocator.cc:674] 1 Chunks of size 266727424 totalling 254.37MiB
2018-10-18 00:33:05.626823: I tensorflow/core/common_runtime/bfc_allocator.cc:674] 1 Chunks of size 334904064 totalling 319.39MiB
2018-10-18 00:33:05.626826: I tensorflow/core/common_runtime/bfc_allocator.cc:674] 3 Chunks of size 364953600 totalling 1.02GiB
2018-10-18 00:33:05.626829: I tensorflow/core/common_runtime/bfc_allocator.cc:674] 3 Chunks of size 486604800 totalling 1.36GiB
2018-10-18 00:33:05.626832: I tensorflow/core/common_runtime/bfc_allocator.cc:678] Sum Total of in-use chunks: 9.83GiB
2018-10-18 00:33:05.626837: I tensorflow/core/common_runtime/bfc_allocator.cc:680] Stats: 
Limit:                 10586741146
InUse:                 10556298496
MaxInUse:              10556328192
NumAllocs:                   10164
MaxAllocSize:           3721396224
2018-10-18 00:33:05.626970: W tensorflow/core/common_runtime/bfc_allocator.cc:279] ***************************************************************************************************x
2018-10-18 00:33:05.626989: W tensorflow/core/framework/op_kernel.cc:1318] OP_REQUIRES failed at conv_ops.cc:693 : Resource exhausted: OOM when allocating tensor with shape[30,320,60,66] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
Traceback (most recent call last):
  File "/home/public/anaconda3/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 2963, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-5-4e46772605b9>", line 1, in <module>
    model.fit(X, Y, epochs=2)
  File "/home/public/anaconda3/lib/python3.6/site-packages/keras/engine/training.py", line 1039, in fit
    validation_steps=validation_steps)
  File "/home/public/anaconda3/lib/python3.6/site-packages/keras/engine/training_arrays.py", line 199, in fit_loop
    outs = f(ins_batch)
  File "/home/public/anaconda3/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py", line 2715, in __call__
    return self._call(inputs)
  File "/home/public/anaconda3/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py", line 2675, in _call
    fetched = self._callable_fn(*array_vals)
  File "/home/public/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1454, in __call__
    self._session._session, self._handle, args, status, None)
  File "/home/public/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/errors_impl.py", line 519, in __exit__
    c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[30,320,60,66] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
	 [[Node: block35_7_conv/convolution = Conv2D[T=DT_FLOAT, _class=["loc:@train...kpropInput"], data_format="NCHW", dilations=[1, 1, 1, 1], padding="SAME", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](block35_7_mixed/concat, block35_7_conv/kernel/read)]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
	 [[Node: loss_1/mul/_8545 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_22137_loss_1/mul", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

How can I solve this problem? Thank you!

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Comments:7 (4 by maintainers)

github_iconTop GitHub Comments

2reactions
qubvelcommented, Oct 18, 2018

For FPN shapes should be divisible by 32

0reactions
MaxKinnycommented, Oct 18, 2018

I wonder if I select another model, it will convergent or not. So the model changes from PSPNet to FPN, but I receive this problem:

Traceback (most recent call last):
  File "/home/public/anaconda3/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 2963, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-4-9709eb679a88>", line 1, in <module>
    model = FPN(backbone_name='inceptionresnetv2', input_shape=(480, 528, 3), freeze_encoder=True, classes=3)
  File "/home/public/anaconda3/lib/python3.6/site-packages/segmentation_models/fpn/model.py", line 84, in FPN
    activation=activation)
  File "/home/public/anaconda3/lib/python3.6/site-packages/segmentation_models/fpn/builder.py", line 65, in build_fpn
    stage=i)(c, m)
  File "/home/public/anaconda3/lib/python3.6/site-packages/segmentation_models/fpn/blocks.py", line 35, in layer
    x = Add()([x, up])
  File "/home/public/anaconda3/lib/python3.6/site-packages/keras/engine/base_layer.py", line 431, in __call__
    self.build(unpack_singleton(input_shapes))
  File "/home/public/anaconda3/lib/python3.6/site-packages/keras/layers/merge.py", line 91, in build
    shape)
  File "/home/public/anaconda3/lib/python3.6/site-packages/keras/layers/merge.py", line 61, in _compute_elemwise_op_output_shape
    str(shape1) + ' ' + str(shape2))
ValueError: Operands could not be broadcast together with shapes (30, 33, 256) (30, 34, 256)

My code:

import numpy as np
import os
import keras
import cv2
from segmentation_models import FPN
from segmentation_models.utils import set_trainable
from segmentation_models.backbones import get_preprocessing


# prepare data
files_y = os.listdir("/home/public/Desktop/Inception_resnet_v2/Data/training_y")
files_x = os.listdir("/home/public/Desktop/Inception_resnet_v2/Data/training_x")
files_y.sort()
files_x.sort()
num_x = len(files_x)
num_y = len(files_y)
X = np.empty((num_x, 480, 528, 3))
Y = np.empty((num_y, 480, 528, 3))
for ind, file in enumerate(files_x):
    img = cv2.imread("/home/public/Desktop/Inception_resnet_v2/Data/training_x" + '/' + file)
    X[ind, :, :, :] = img
for ind, file in enumerate(files_y):
    img = cv2.imread("/home/public/Desktop/Inception_resnet_v2/Data/training_y" + '/' + file)
    Y[ind, :, :, :] = img

# pre-process
preprocessing_fn = get_preprocessing('inceptionresnetv2')
X = preprocessing_fn(X)
Y = preprocessing_fn(Y)


# prepare model
model = FPN(backbone_name='inceptionresnetv2', input_shape=(480, 528, 3), freeze_encoder=True, classes=3)
nadam = keras.optimizers.Nadam(lr=0.001, beta_1=0.9, beta_2=0.999, epsilon=1e-08, schedule_decay=0.004)
model.compile(optimizer=nadam, loss='binary_crossentropy', metrics=['binary_accuracy'])

# pretrain model decoder
model.fit(X, Y, epochs=10000)
Read more comments on GitHub >

github_iconTop Results From Across the Web

OOM when allocating tensor with shape - Stack Overflow
ResourceExhaustedError : OOM when allocating tensor with shape[3840,155229] [[Node: decoder/previous_decoder/Add = Add[T=DT_FLOAT, ...
Read more >
How to solve Error of ResourceExhaustedError in Tensorflow
ResourceExhaustedError : OOM when allocating tensor with shape[8,192,23,23] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator ...
Read more >
tf.errors.ResourceExhaustedError | TensorFlow v2.11.0
Some resource has been exhausted.
Read more >
Resource exhausted: OOM when allocating tensor with shape ...
ResourceExhaustedError : 2 root error(s) found. (0) Resource exhausted: OOM when allocating tensor with shape[32,960,10,10] and type float on ...
Read more >
ResourceExhaustedError: OOM when allocating tensor with ...
2021-11-25 13:18:23.325521: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] Stats: Limit: 91222016 InUse: 61331456 MaxInUse: ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found