Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Out of memory error when predicting with GPU

See original GitHub issue

I have a model which trains and predicts fine using CPU-only TensorFlow (2.0.0). However, when I try to run it in a different conda environment with tensorflow-gpu (also 2.0.0), this is the output:

(tf-gpu) Z:\My Documents>python tensorflow/project/Training_v5.py 2019-11-22 08:37:53.351449: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_100.dll C:\Users\bc175\AppData\Local\conda\conda\envs\tf-gpu\lib\site-packages\h5py\_hl\dataset.py:313: H5pyDeprecationWarning: dataset.value has been deprecated. Use dataset[()] instead. "Use dataset[()] instead.", H5pyDeprecationWarning) {'n_train': 663, 'n_validation': 73, 'validation_split': 0.1, 'downsample_factor': 2, 'output_shape': (50, 50), 'n_output_channels': 68, 'shuffle': True, 'sigma': 5, 'output_sigma': 1.25, 'use_graph': True, 'graph_scale': 1, 'random_seed': 1, 'augmenter': True, 'datapath': 'R:/Locust/bc175/Warren Research Technician/deepposekit/flat/annotation_flat_v3.h5', 'dataset': 'images', 'generator': 'DataGenerator', 'n_samples': 736, 'image_shape': (200, 200, 1), 'keypoints_shape': (33, 2)} 2019-11-22 08:38:06.557412: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library nvcuda.dll 2019-11-22 08:38:06.639050: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties: name: GeForce GTX 1060 3GB major: 6 minor: 1 memoryClockRate(GHz): 1.7085 pciBusID: 0000:01:00.0 2019-11-22 08:38:06.648011: I tensorflow/stream_executor/platform/default/dlopen_checker_stub.cc:25] GPU libraries are statically linked, skip dlopen check. 2019-11-22 08:38:06.657471: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0 2019-11-22 08:38:06.662034: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX AVX2 2019-11-22 08:38:06.672387: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties: name: GeForce GTX 1060 3GB major: 6 minor: 1 memoryClockRate(GHz): 1.7085 pciBusID: 0000:01:00.0 2019-11-22 08:38:06.679775: I tensorflow/stream_executor/platform/default/dlopen_checker_stub.cc:25] GPU libraries are statically linked, skip dlopen check. 2019-11-22 08:38:06.687691: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0 2019-11-22 08:38:07.269510: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1159] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-11-22 08:38:07.274566: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1165] 0 2019-11-22 08:38:07.277998: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1178] 0: N 2019-11-22 08:38:07.284995: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 2107 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1060 3GB, pci bus id: 0000:01:00.0, compute capability: 6.1) {'name': 'StackedHourglass', 'n_stacks': 2, 'n_transitions': 3, 'bottleneck_factor': 2, 'filters': 256, 'subpixel': True, 'n_train': 663, 'n_validation': 73, 'validation_split': 0.1, 'downsample_factor': 2, 'output_shape': (50, 50), 'n_output_channels': 68, 'shuffle': True, 'sigma': 5, 'output_sigma': 1.25, 'use_graph': True, 'graph_scale': 1, 'random_seed': 1, 'augmenter': True, 'datapath': 'R:/Locust/bc175/Warren Research Technician/deepposekit/flat/annotation_flat_v3.h5', 'dataset': 'images', 'generator': 'DataGenerator', 'n_samples': 736, 'image_shape': (200, 200, 1), 'keypoints_shape': (33, 2)} Epoch 1/2000 2019-11-22 08:44:03.979571: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudnn64_7.dll 2019-11-22 08:44:04.856741: W tensorflow/stream_executor/cuda/redzone_allocator.cc:312] Internal: Invoking ptxas not supported on Windows Relying on driver to perform ptx compilation. This message will be only logged once. 2019-11-22 08:44:05.036431: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_100.dll

Above is all the normal output. Below appears to be where the error comes from:

2019-11-22 08:44:15.288237: W tensorflow/core/common_runtime/bfc_allocator.cc:419] Allocator (GPU_0_bfc) ran out of memory trying to allocate 234.38MiB (rounded to 245760000). Current allocation summary follows. 2019-11-22 08:44:15.297222: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (256): Total Chunks: 16, Chunks in use: 15. 4.0KiB allocated for chunks. 3.8KiB in use in bin. 64B client-requested in use in bin. 2019-11-22 08:44:15.306538: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (512): Total Chunks: 296, Chunks in use: 296. 148.0KiB allocated for chunks. 148.0KiB in use in bin. 144.7KiB client-requested in use in bin. 2019-11-22 08:44:15.315796: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (1024): Total Chunks: 547, Chunks in use: 547. 549.0KiB allocated for chunks. 549.0KiB in use in bin. 547.0KiB client-requested in use in bin. 2019-11-22 08:44:15.325491: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (2048): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin. 2019-11-22 08:44:15.334153: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (4096): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin. 2019-11-22 08:44:15.342867: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (8192): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin. 2019-11-22 08:44:15.351168: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (16384): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin. 2019-11-22 08:44:15.361080: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (32768): Total Chunks: 2, Chunks in use: 2. 98.0KiB allocated for chunks. 98.0KiB in use in bin. 98.0KiB client-requested in use in bin. 2019-11-22 08:44:15.370432: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (65536): Total Chunks: 10, Chunks in use: 10. 806.5KiB allocated for chunks. 806.5KiB in use in bin. 661.0KiB client-requested in use in bin. 2019-11-22 08:44:15.380524: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (131072): Total Chunks: 102, Chunks in use: 102. 13.06MiB allocated for chunks. 13.06MiB in use in bin. 12.75MiB client-requested in use in bin. 2019-11-22 08:44:15.390888: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (262144): Total Chunks: 9, Chunks in use: 9. 2.31MiB allocated for chunks. 2.31MiB in use in bin. 2.25MiB client-requested in use in bin. 2019-11-22 08:44:15.400214: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (524288): Total Chunks: 53, Chunks in use: 52. 30.61MiB allocated for chunks. 30.07MiB in use in bin. 29.25MiB client-requested in use in bin. 2019-11-22 08:44:15.410526: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (1048576): Total Chunks: 1, Chunks in use: 0. 1.27MiB allocated for chunks. 0B in use in bin. 0B client-requested in use in bin. 2019-11-22 08:44:15.420570: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (2097152): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin. 2019-11-22 08:44:15.429672: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (4194304): Total Chunks: 3, Chunks in use: 2. 21.97MiB allocated for chunks. 14.65MiB in use in bin. 14.65MiB client-requested in use in bin. 2019-11-22 08:44:15.441092: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (8388608): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin. 2019-11-22 08:44:15.451590: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (16777216): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin. 2019-11-22 08:44:15.460364: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (33554432): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin. 2019-11-22 08:44:15.470150: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (67108864): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin. 2019-11-22 08:44:15.479661: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (134217728): Total Chunks: 5, Chunks in use: 4. 1.07GiB allocated for chunks. 937.50MiB in use in bin. 937.50MiB client-requested in use in bin. 2019-11-22 08:44:15.489973: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (268435456): Total Chunks: 2, Chunks in use: 2. 937.50MiB allocated for chunks. 937.50MiB in use in bin. 937.50MiB client-requested in use in bin. 2019-11-22 08:44:15.500336: I tensorflow/core/common_runtime/bfc_allocator.cc:885] Bin for 234.38MiB was 128.00MiB, Chunk State: 2019-11-22 08:44:15.505805: I tensorflow/core/common_runtime/bfc_allocator.cc:891] Size: 161.44MiB | Requested Size: 0B | in_use: 0 | bin_num: 19, prev: Size: 234.38MiB | Requested Size: 234.38MiB | in_use: 1 | bin_num: -1 2019-11-22 08:44:15.516312: I tensorflow/core/common_runtime/bfc_allocator.cc:898] Next region of size 2209598464 2019-11-22 08:44:15.521449: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0000000B03600000 next 1 of size 1280 2019-11-22 08:44:15.528409: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0000000B03600500 next 5 of size 1280 2019-11-22 08:44:15.534291: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0000000B03600A00 next 4 of size 1024 2019-11-22 08:44:15.539102: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0000000B03600E00 next 7 of size 1024 2019-11-22 08:44:15.544718: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0000000B03601200 next 2 of size 512 2019-11-22 08:44:15.550555: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0000000B03601400 next 8 of size 512 2019-11-22 08:44:15.556552: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0000000B03601600 next 9 of size 512 2019-11-22 08:44:15.562202: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0000000B03601800 next 11 of size 512

This goes on for a fair bit, so I’ve cut most of them out

`2019-11-22 08:44:21.235488: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0000000B07CCB100 next 1014 of size 491520000 2019-11-22 08:44:21.242137: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0000000B2518B100 next 1015 of size 491520000 2019-11-22 08:44:21.248308: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0000000B4264B100 next 1029 of size 245760000 2019-11-22 08:44:21.254596: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0000000B510AB100 next 1030 of size 245760000 2019-11-22 08:44:21.262012: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0000000B5FB0B100 next 1043 of size 245760000 2019-11-22 08:44:21.268683: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0000000B6E56B100 next 1044 of size 245760000 2019-11-22 08:44:21.274896: I tensorflow/core/common_runtime/bfc_allocator.cc:905] Free at 0000000B7CFCB100 next 18446744073709551615 of size 169286400 2019-11-22 08:44:21.282446: I tensorflow/core/common_runtime/bfc_allocator.cc:914] Summary of in-use Chunks by size: 2019-11-22 08:44:21.288317: I tensorflow/core/common_runtime/bfc_allocator.cc:917] 15 Chunks of size 256 totalling 3.8KiB 2019-11-22 08:44:21.293332: I tensorflow/core/common_runtime/bfc_allocator.cc:917] 296 Chunks of size 512 totalling 148.0KiB 2019-11-22 08:44:21.299070: I tensorflow/core/common_runtime/bfc_allocator.cc:917] 542 Chunks of size 1024 totalling 542.0KiB 2019-11-22 08:44:21.303852: I tensorflow/core/common_runtime/bfc_allocator.cc:917] 2 Chunks of size 1280 totalling 2.5KiB 2019-11-22 08:44:21.309380: I tensorflow/core/common_runtime/bfc_allocator.cc:917] 3 Chunks of size 1536 totalling 4.5KiB 2019-11-22 08:44:21.315147: I tensorflow/core/common_runtime/bfc_allocator.cc:917] 2 Chunks of size 50176 totalling 98.0KiB 2019-11-22 08:44:21.320306: I tensorflow/core/common_runtime/bfc_allocator.cc:917] 7 Chunks of size 69632 totalling 476.0KiB 2019-11-22 08:44:21.326092: I tensorflow/core/common_runtime/bfc_allocator.cc:917] 1 Chunks of size 84480 totalling 82.5KiB 2019-11-22 08:44:21.332237: I tensorflow/core/common_runtime/bfc_allocator.cc:917] 2 Chunks of size 126976 totalling 248.0KiB 2019-11-22 08:44:21.336916: I tensorflow/core/common_runtime/bfc_allocator.cc:917] 98 Chunks of size 131072 totalling 12.25MiB 2019-11-22 08:44:21.342753: I tensorflow/core/common_runtime/bfc_allocator.cc:917] 3 Chunks of size 196608 totalling 576.0KiB 2019-11-22 08:44:21.347654: I tensorflow/core/common_runtime/bfc_allocator.cc:917] 1 Chunks of size 258048 totalling 252.0KiB 2019-11-22 08:44:21.353470: I tensorflow/core/common_runtime/bfc_allocator.cc:917] 8 Chunks of size 262144 totalling 2.00MiB 2019-11-22 08:44:21.358586: I tensorflow/core/common_runtime/bfc_allocator.cc:917] 1 Chunks of size 327680 totalling 320.0KiB 2019-11-22 08:44:21.363791: I tensorflow/core/common_runtime/bfc_allocator.cc:917] 42 Chunks of size 589824 totalling 23.63MiB 2019-11-22 08:44:21.369619: I tensorflow/core/common_runtime/bfc_allocator.cc:917] 1 Chunks of size 591360 totalling 577.5KiB 2019-11-22 08:44:21.374665: I tensorflow/core/common_runtime/bfc_allocator.cc:917] 1 Chunks of size 591872 totalling 578.0KiB 2019-11-22 08:44:21.380761: I tensorflow/core/common_runtime/bfc_allocator.cc:917] 1 Chunks of size 594944 totalling 581.0KiB 2019-11-22 08:44:21.386268: I tensorflow/core/common_runtime/bfc_allocator.cc:917] 1 Chunks of size 598016 totalling 584.0KiB 2019-11-22 08:44:21.392045: I tensorflow/core/common_runtime/bfc_allocator.cc:917] 1 Chunks of size 600064 totalling 586.0KiB 2019-11-22 08:44:21.397775: I tensorflow/core/common_runtime/bfc_allocator.cc:917] 1 Chunks of size 610304 totalling 596.0KiB 2019-11-22 08:44:21.402543: I tensorflow/core/common_runtime/bfc_allocator.cc:917] 1 Chunks of size 611840 totalling 597.5KiB 2019-11-22 08:44:21.409106: I tensorflow/core/common_runtime/bfc_allocator.cc:917] 1 Chunks of size 720896 totalling 704.0KiB 2019-11-22 08:44:21.415070: I tensorflow/core/common_runtime/bfc_allocator.cc:917] 2 Chunks of size 917504 totalling 1.75MiB 2019-11-22 08:44:21.420489: I tensorflow/core/common_runtime/bfc_allocator.cc:917] 2 Chunks of size 7680000 totalling 14.65MiB 2019-11-22 08:44:21.426903: I tensorflow/core/common_runtime/bfc_allocator.cc:917] 4 Chunks of size 245760000 totalling 937.50MiB 2019-11-22 08:44:21.432880: I tensorflow/core/common_runtime/bfc_allocator.cc:917] 2 Chunks of size 491520000 totalling 937.50MiB 2019-11-22 08:44:21.438181: I tensorflow/core/common_runtime/bfc_allocator.cc:921] Sum Total of in-use chunks: 1.89GiB 2019-11-22 08:44:21.443098: I tensorflow/core/common_runtime/bfc_allocator.cc:923] total_region_allocated_bytes_: 2209598464 memory_limit_: 2209598668 available bytes: 204 curr_region_allocation_bytes_: 4419197440 2019-11-22 08:44:21.452939: I tensorflow/core/common_runtime/bfc_allocator.cc:929] Stats: Limit: 2209598668 InUse: 2030728448 MaxInUse: 2030728448 NumAllocs: 4833 MaxAllocSize: 508297216

2019-11-22 08:44:21.465913: W tensorflow/core/common_runtime/bfc_allocator.cc:424] *********************************************************************************************_______ 2019-11-22 08:44:21.474321: W tensorflow/core/framework/op_kernel.cc:1622] OP_REQUIRES failed at conv_ops.cc:947 : Resource exhausted: OOM when allocating tensor with shape[48,128,100,100] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc Traceback (most recent call last): File “tensorflow/project/Training_v5.py”, line 152, in <module> shuffle=True File “C:\Users\bc175\AppData\Local\conda\conda\envs\tf-gpu\lib\site-packages\deepposekit\models\engine.py”, line 174, in fit **kwargs File “C:\Users\bc175\AppData\Local\conda\conda\envs\tf-gpu\lib\site-packages\tensorflow_core\python\keras\engine\training.py”, line 1297, in fit_generator steps_name=‘steps_per_epoch’) File “C:\Users\bc175\AppData\Local\conda\conda\envs\tf-gpu\lib\site-packages\tensorflow_core\python\keras\engine\training_generator.py”, line 265, in model_iteration batch_outs = batch_function(*batch_data) File “C:\Users\bc175\AppData\Local\conda\conda\envs\tf-gpu\lib\site-packages\tensorflow_core\python\keras\engine\training.py”, line 973, in train_on_batch class_weight=class_weight, reset_metrics=reset_metrics) File “C:\Users\bc175\AppData\Local\conda\conda\envs\tf-gpu\lib\site-packages\tensorflow_core\python\keras\engine\training_v2_utils.py”, line 264, in train_on_batch output_loss_metrics=model._output_loss_metrics) File “C:\Users\bc175\AppData\Local\conda\conda\envs\tf-gpu\lib\site-packages\tensorflow_core\python\keras\engine\training_eager.py”, line 311, in train_on_batch output_loss_metrics=output_loss_metrics)) File “C:\Users\bc175\AppData\Local\conda\conda\envs\tf-gpu\lib\site-packages\tensorflow_core\python\keras\engine\training_eager.py”, line 252, in _process_single_batch training=training)) File “C:\Users\bc175\AppData\Local\conda\conda\envs\tf-gpu\lib\site-packages\tensorflow_core\python\keras\engine\training_eager.py”, line 127, in _model_loss outs = model(inputs, **kwargs) File “C:\Users\bc175\AppData\Local\conda\conda\envs\tf-gpu\lib\site-packages\tensorflow_core\python\keras\engine\base_layer.py”, line 891, in call outputs = self.call(cast_inputs, *args, **kwargs) File “C:\Users\bc175\AppData\Local\conda\conda\envs\tf-gpu\lib\site-packages\tensorflow_core\python\keras\engine\network.py”, line 708, in call convert_kwargs_to_constants=base_layer_utils.call_context().saving) File “C:\Users\bc175\AppData\Local\conda\conda\envs\tf-gpu\lib\site-packages\tensorflow_core\python\keras\engine\network.py”, line 860, in _run_internal_graph output_tensors = layer(computed_tensors, **kwargs) File “C:\Users\bc175\AppData\Local\conda\conda\envs\tf-gpu\lib\site-packages\tensorflow_core\python\keras\engine\base_layer.py”, line 891, in call outputs = self.call(cast_inputs, *args, **kwargs) File “C:\Users\bc175\AppData\Local\conda\conda\envs\tf-gpu\lib\site-packages\tensorflow_core\python\keras\layers\convolutional.py”, line 197, in call outputs = self._convolution_op(inputs, self.kernel) File “C:\Users\bc175\AppData\Local\conda\conda\envs\tf-gpu\lib\site-packages\tensorflow_core\python\ops\nn_ops.py”, line 1134, in call return self.conv_op(inp, filter) File “C:\Users\bc175\AppData\Local\conda\conda\envs\tf-gpu\lib\site-packages\tensorflow_core\python\ops\nn_ops.py”, line 639, in call return self.call(inp, filter) File “C:\Users\bc175\AppData\Local\conda\conda\envs\tf-gpu\lib\site-packages\tensorflow_core\python\ops\nn_ops.py”, line 238, in call name=self.name) File “C:\Users\bc175\AppData\Local\conda\conda\envs\tf-gpu\lib\site-packages\tensorflow_core\python\ops\nn_ops.py”, line 2010, in conv2d name=name) File “C:\Users\bc175\AppData\Local\conda\conda\envs\tf-gpu\lib\site-packages\tensorflow_core\python\ops\gen_nn_ops.py”, line 1031, in conv2d data_format=data_format, dilations=dilations, name=name, ctx=_ctx) File “C:\Users\bc175\AppData\Local\conda\conda\envs\tf-gpu\lib\site-packages\tensorflow_core\python\ops\gen_nn_ops.py”, line 1130, in conv2d_eager_fallback ctx=_ctx, name=name) File “C:\Users\bc175\AppData\Local\conda\conda\envs\tf-gpu\lib\site-packages\tensorflow_core\python\eager\execute.py”, line 67, in quick_execute six.raise_from(core._status_to_exception(e.code, message), None) File “<string>”, line 3, in raise_from tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[48,128,100,100] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc [Op:Conv2D]`

From looking through this, it seems as though the issue might be that my GPU, whilst supported fine by tensorflow, only has 3GB of dedicated memory? If so, is there any workaround for that limitation?

Issue Analytics

State:
Created 4 years ago
Comments:8 (4 by maintainers)

Top GitHub Comments

1reaction

jgravingcommented, Nov 22, 2019

Have you tried reducing the batch size during prediction? Set batch_size=1 when you call model.predict() and work your way up from there.

0reactions

stale[bot]commented, Dec 6, 2019

This issue has been automatically closed because it has not had recent activity.

Top Results From Across the Web

Resolving CUDA Being Out of Memory With Gradient ...

The issue is, to train the model using GPU, you need the error between the labels and predictions, and for the error, you...

CUDA out of memory when using Trainer with compute_metrics

This error means you are trying to get predictions that just don't fit in RAM, so there is nothing Trainer can do to...

Solving "CUDA out of memory" Error - Kaggle

If you try to train multiple models on GPU, you are most likely to encounter some error similar to this one: RuntimeError: CUDA...

GPU out of memory error message on Google Colab

I was attempting to use the trained model to predict the test dataset (~17,000 entries) when CUDA out of memory error appeared.

Huge memory leakage issue with tf.keras.models.predict()

Takeaway: CPU/GPU usage is extremely POOR, perhaps due to memory leakage and sub-optimal process scheduling among CPUs/GPUs. Environment: TF-MACOS==2.9.2 and TF ...