question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Resource exhausted: OOM when allocating tensor with shape[256,1114] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc

See original GitHub issue

Hello,

I’m running the last version MusicVAE from repository on ubuntu 18.04, cuda 10.1, tensorflow 2.2.0, configuration - hier-trio_16bar and have the error below (i tried for different batch sizes, even 1 and for different learning rates, but the problem is the same). Do you know how to fix it?

2020-06-09 21:01:27.365621: I tensorflow/core/common_runtime/bfc_allocator.cc:1010] Stats: Limit: 14684815360 InUse: 14684616704 MaxInUse: 14684815360 NumAllocs: 26588 MaxAllocSize: 181403648

2020-06-09 21:01:27.365991: W tensorflow/core/common_runtime/bfc_allocator.cc:439] **************************************************************************************************** 2020-06-09 21:01:27.366026: W tensorflow/core/framework/op_kernel.cc:1753] OP_REQUIRES failed at lstm_ops.cc:372 : Resource exhausted: OOM when allocating tensor with shape[256,1024] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc 2020-06-09 21:01:34.147376: W tensorflow/core/kernels/data/cache_dataset_ops.cc:794] The calling iterator did not fully read the dataset being cached. In order to avoid unexpected truncation of the dataset, the partially cached contents of the dataset will be discarded. This can happen if you have an input pipeline similar to dataset.cache().take(k).repeat(). You should use dataset.take(k).cache().repeat() instead. Traceback (most recent call last): File “/home/burashnikova/env-tf22/lib/python3.6/site-packages/tensorflow/python/client/session.py”, line 1365, in _do_call return fn(*args) File “/home/burashnikova/env-tf22/lib/python3.6/site-packages/tensorflow/python/client/session.py”, line 1350, in _run_fn target_list, run_metadata) File “/home/burashnikova/env-tf22/lib/python3.6/site-packages/tensorflow/python/client/session.py”, line 1443, in _call_tf_sessionrun run_metadata) tensorflow.python.framework.errors_impl.ResourceExhaustedError: 2 root error(s) found. (0) Resource exhausted: OOM when allocating tensor with shape[256,1114] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc [[{{node swap_in_core_decoder_1/core_decoder_0/decoder/while/BasicDecoderStep/decoder/multi_rnn_cell/cell_0/lstm_cell/LSTMBlockCell_13_0}}]] Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

     [[add/_2901]]

Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

(1) Resource exhausted: OOM when allocating tensor with shape[256,1114] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc [[{{node swap_in_core_decoder_1/core_decoder_0/decoder/while/BasicDecoderStep/decoder/multi_rnn_cell/cell_0/lstm_cell/LSTMBlockCell_13_0}}]] Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

0 successful operations. 0 derived errors ignored.

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File “music_vae_train.py”, line 340, in <module> console_entry_point() File “music_vae_train.py”, line 336, in console_entry_point tf.app.run(main) File “/home/burashnikova/env-tf22/lib/python3.6/site-packages/tensorflow/python/platform/app.py”, line 40, in run _run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef) File “/home/burashnikova/env-tf22/lib/python3.6/site-packages/absl/app.py”, line 299, in run _run_main(main, args) File “/home/burashnikova/env-tf22/lib/python3.6/site-packages/absl/app.py”, line 250, in _run_main sys.exit(main(argv)) File “music_vae_train.py”, line 331, in main run(configs.CONFIG_MAP) File “music_vae_train.py”, line 312, in run task=FLAGS.task) File “music_vae_train.py”, line 211, in train is_chief=is_chief) File “/home/burashnikova/env-tf22/lib/python3.6/site-packages/tf_slim/training/training.py”, line 551, in train loss = session.run(train_op, run_metadata=run_metadata) File “/home/burashnikova/env-tf22/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py”, line 778, in run run_metadata=run_metadata) File “/home/burashnikova/env-tf22/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py”, line 1283, in run run_metadata=run_metadata) File “/home/burashnikova/env-tf22/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py”, line 1384, in run raise six.reraise(*original_exc_info) File “/home/burashnikova/env-tf22/lib/python3.6/site-packages/six.py”, line 703, in reraise raise value File “/home/burashnikova/env-tf22/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py”, line 1369, in run return self._sess.run(*args, **kwargs) File “/home/burashnikova/env-tf22/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py”, line 1442, in run run_metadata=run_metadata) File “/home/burashnikova/env-tf22/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py”, line 1200, in run return self._sess.run(*args, **kwargs) File “/home/burashnikova/env-tf22/lib/python3.6/site-packages/tensorflow/python/client/session.py”, line 958, in run run_metadata_ptr) File “/home/burashnikova/env-tf22/lib/python3.6/site-packages/tensorflow/python/client/session.py”, line 1181, in _run feed_dict_tensor, options, run_metadata) File “/home/burashnikova/env-tf22/lib/python3.6/site-packages/tensorflow/python/client/session.py”, line 1359, in _do_run run_metadata) File “/home/burashnikova/env-tf22/lib/python3.6/site-packages/tensorflow/python/client/session.py”, line 1384, in _do_call raise type(e)(node_def, op, message) tensorflow.python.framework.errors_impl.ResourceExhaustedError: 2 root error(s) found. (0) Resource exhausted: OOM when allocating tensor with shape[256,1114] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc [[{{node swap_in_core_decoder_1/core_decoder_0/decoder/while/BasicDecoderStep/decoder/multi_rnn_cell/cell_0/lstm_cell/LSTMBlockCell_13_0}}]] Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

     [[add/_2901]]

Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

(1) Resource exhausted: OOM when allocating tensor with shape[256,1114] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc [[{{node swap_in_core_decoder_1/core_decoder_0/decoder/while/BasicDecoderStep/decoder/multi_rnn_cell/cell_0/lstm_cell/LSTMBlockCell_13_0}}]] Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

0 successful operations. 0 derived errors ignored.

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:14 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
adarobcommented, Jun 9, 2020

This means your GPU does not have enough memory to support he model + batch sizes you’re using. Try reducing the batch size until it fits.

0reactions
AI-Gurucommented, May 23, 2022

I understand. Training such models is often a challenge of patience. Either get a stronger GPU (or multiple) or be patient 🤗

Read more comments on GitHub >

github_iconTop Results From Across the Web

Resource exhausted: OOM when allocating tensor with shape ...
ResourceExhaustedError: 2 root error(s) found. (0) Resource exhausted: OOM when allocating tensor with shape[32,960,10,10] and type float on ...
Read more >
OOM when allocating tensor with shape - Stack Overflow
Let's divide the issues one by one: About tensorflow to allocate all memory in advance, you can use following code snippet to let...
Read more >
Resource exhausted: OOM when allocating tensor - Kaggle
(0) Resource exhausted: OOM when allocating tensor with shape[64,28,28,384] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator ...
Read more >
Resource exhausted: OOM when allo… - Apple Developer
Resource exhausted : OOM when allocating tensor with shape[256,384,3072] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator Simple ...
Read more >
OOM when allocating tensor"? - Jetson Nano
... Resource exhausted: OOM when allocating tensor with shape[32,256] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found