Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

ResourceExhaustedError after segmentation models update!

See original GitHub issue

Hi! I was working FPN with ‘resnext101’ backbone on Google Colab. I’ve trained the model and have done lots of experiments and the results were very good. Today, after I updated the segmentation models (actually, every time I use Google Colab, I have to reinstall it) I got the following error shown below. By the way, I tried to use Unet with ‘vgg16’ backbone and everything went well. I wonder why FPN with resnext101 backbone does not fit GPU memory as it fit two days ago.

Thank you very much @qubvel .

Edit1: FPN with vgg16 backbone is OK. FPN with vgg19 backbone is OK. FPN with resnet34 backbone is OK. FPN with resnet50 backbone is NOT OK (The same error is shown below). FPN with resnet101 backbone is NOT OK (The same error is shown below). FPN with resnext50 backbone is NOT OK (The same error is shown below).

Epoch 1/100
---------------------------------------------------------------------------
ResourceExhaustedError                    Traceback (most recent call last)
<ipython-input-22-1b2892f8cab2> in <module>()
----> 1 get_ipython().run_cell_magic('time', '', 'history = model.fit_generator(\n    generator = zipped_train_generator,\n  validation_data=(X_validation, y_validation),\n    steps_per_epoch=len(X_train) // NUM_BATCH,\n    callbacks= callbacks_list,\n    verbose = 1,\n    epochs = NUM_EPOCH)')

9 frames
</usr/local/lib/python3.6/dist-packages/decorator.py:decorator-gen-60> in time(self, line, cell, local_ns)

<timed exec> in <module>()

/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py in __call__(self, *args, **kwargs)
   1456         ret = tf_session.TF_SessionRunCallable(self._session._session,
   1457                                                self._handle, args,
-> 1458                                                run_metadata_ptr)
   1459         if run_metadata:
   1460           proto_data = tf_session.TF_GetBuffer(run_metadata_ptr)

ResourceExhaustedError: 2 root error(s) found.
  (0) Resource exhausted: OOM when allocating tensor with shape[32,128,112,112] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
	 [[{{node training/RMSprop/gradients/zeros_21}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

	 [[loss/mul/_11081]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

  (1) Resource exhausted: OOM when allocating tensor with shape[32,128,112,112] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
	 [[{{node training/RMSprop/gradients/zeros_21}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

0 successful operations.
0 derived errors ignored.

Issue Analytics

State:
Created 4 years ago
Comments:18 (11 by maintainers)

Top GitHub Comments

2reactions

qubvelcommented, Aug 10, 2019

Yeah, thats strange…

1reaction

qubvelcommented, Aug 10, 2019

pip install -U segmentation-models==0.2.1

Top Results From Across the Web

ResourceExhaustedError on unet · Issue #260 · NifTK/NiftyNet

I got this error : ResourceExhaustedError (see above for traceback): OOM ... when trying to run 2D-unet on GPU for cells segmentation tasks ......

failed to allocate memory [Op:AddV2] - Stack Overflow

Installed everything as instructed in https://www.tensorflow.org/install/gpu). But now when I am trying to build the model this error comes up:

Solving Out Of Memory (OOM) Errors on Keras and ... - LinkedIn

The following may occur when a model has exhausted the memory : Resource Exhausted Error : an error message that indicates Out Of...

How to Train an Object Detection Model with Keras

As such, we will use the dataset to learn a kangaroo object detection task, and ignore the masks and not focus on the...

SIIM-ACR Pneumothorax Segmentation | Kaggle

ResourceExhaustedError : OOM when allocating tensor with shape[128,64,128,128] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator ...