question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[4096]

See original GitHub issue

I am trying to perform faster rcnn on a custom dataset based on pascal_VOC. But I get this error when I start to train:

Stats: Limit: 1696386252 InUse: 1685909760 MaxInUse: 1696386048 NumAllocs: 152 MaxAllocSize: 533417472

2017-06-03 04:51:48.992694: W tensorflow/core/common_runtime/bfc_allocator.cc:277] *********************************************************************************************xxxxxxx 2017-06-03 04:51:48.992751: W tensorflow/core/framework/op_kernel.cc:1152] Resource exhausted: OOM when allocating tensor with shape[25088,4096] Traceback (most recent call last): File “./faster_rcnn/train_net.py”, line 109, in <module> restore=bool(int(args.restore))) File “./faster_rcnn/…/lib/fast_rcnn/train.py”, line 400, in train_net sw.train_model(sess, max_iters, restore=restore) File “./faster_rcnn/…/lib/fast_rcnn/train.py”, line 148, in train_model sess.run(tf.global_variables_initializer()) File “/home/hadi/.local/lib/python2.7/site-packages/tensorflow/python/client/session.py”, line 778, in run run_metadata_ptr) File “/home/hadi/.local/lib/python2.7/site-packages/tensorflow/python/client/session.py”, line 982, in _run feed_dict_string, options, run_metadata) File “/home/hadi/.local/lib/python2.7/site-packages/tensorflow/python/client/session.py”, line 1032, in _do_run target_list, options, run_metadata) File “/home/hadi/.local/lib/python2.7/site-packages/tensorflow/python/client/session.py”, line 1052, in _do_call raise type(e)(node_def, op, message) tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[4096] [[Node: fc6/biases/Momentum/Assign = Assign[T=DT_FLOAT, _class=[“loc:@fc6/biases”], use_locking=true, validate_shape=true, _device=“/job:localhost/replica:0/task:0/gpu:0”](fc6/biases/Momentum, fc6/biases/Momentum/Initializer/Const)]]

Caused by op u’fc6/biases/Momentum/Assign’, defined at: File “./faster_rcnn/train_net.py”, line 109, in <module> restore=bool(int(args.restore))) File “./faster_rcnn/…/lib/fast_rcnn/train.py”, line 400, in train_net sw.train_model(sess, max_iters, restore=restore) File “./faster_rcnn/…/lib/fast_rcnn/train.py”, line 143, in train_model train_op = opt.apply_gradients(zip(grads, tvars), global_step=global_step) File “/home/hadi/.local/lib/python2.7/site-packages/tensorflow/python/training/optimizer.py”, line 446, in apply_gradients self._create_slots([_get_variable_for(v) for v in var_list]) File “/home/hadi/.local/lib/python2.7/site-packages/tensorflow/python/training/momentum.py”, line 63, in _create_slots self._zeros_slot(v, “momentum”, self._name) File “/home/hadi/.local/lib/python2.7/site-packages/tensorflow/python/training/optimizer.py”, line 766, in _zeros_slot named_slots[_var_key(var)] = slot_creator.create_zeros_slot(var, op_name) File “/home/hadi/.local/lib/python2.7/site-packages/tensorflow/python/training/slot_creator.py”, line 174, in create_zeros_slot colocate_with_primary=colocate_with_primary) File “/home/hadi/.local/lib/python2.7/site-packages/tensorflow/python/training/slot_creator.py”, line 146, in create_slot_with_initializer dtype) File “/home/hadi/.local/lib/python2.7/site-packages/tensorflow/python/training/slot_creator.py”, line 66, in _create_slot_var validate_shape=validate_shape) File “/home/hadi/.local/lib/python2.7/site-packages/tensorflow/python/ops/variable_scope.py”, line 1049, in get_variable use_resource=use_resource, custom_getter=custom_getter) File “/home/hadi/.local/lib/python2.7/site-packages/tensorflow/python/ops/variable_scope.py”, line 948, in get_variable use_resource=use_resource, custom_getter=custom_getter) File “/home/hadi/.local/lib/python2.7/site-packages/tensorflow/python/ops/variable_scope.py”, line 356, in get_variable validate_shape=validate_shape, use_resource=use_resource) File “/home/hadi/.local/lib/python2.7/site-packages/tensorflow/python/ops/variable_scope.py”, line 341, in _true_getter use_resource=use_resource) File “/home/hadi/.local/lib/python2.7/site-packages/tensorflow/python/ops/variable_scope.py”, line 714, in _get_single_variable validate_shape=validate_shape) File “/home/hadi/.local/lib/python2.7/site-packages/tensorflow/python/ops/variables.py”, line 197, in init expected_shape=expected_shape) File “/home/hadi/.local/lib/python2.7/site-packages/tensorflow/python/ops/variables.py”, line 306, in _init_from_args validate_shape=validate_shape).op File “/home/hadi/.local/lib/python2.7/site-packages/tensorflow/python/ops/state_ops.py”, line 270, in assign validate_shape=validate_shape) File “/home/hadi/.local/lib/python2.7/site-packages/tensorflow/python/ops/gen_state_ops.py”, line 47, in assign use_locking=use_locking, name=name) File “/home/hadi/.local/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py”, line 768, in apply_op op_def=op_def) File “/home/hadi/.local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py”, line 2336, in create_op original_op=self._default_original_op, op_def=op_def) File “/home/hadi/.local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py”, line 1228, in init self._traceback = _extract_stack()

ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[4096] [[Node: fc6/biases/Momentum/Assign = Assign[T=DT_FLOAT, _class=[“loc:@fc6/biases”], use_locking=true, validate_shape=true, _device=“/job:localhost/replica:0/task:0/gpu:0”](fc6/biases/Momentum, fc6/biases/Momentum/Initializer/Const)]]`

How can I make this work? I don’t know how to reduce batch size and see if that helps.

Issue Analytics

  • State:open
  • Created 6 years ago
  • Comments:23

github_iconTop GitHub Comments

21reactions
LMdeLiangMicommented, Jul 2, 2017

Fix! Using beneth setting! Start train now! Iter 450/70000(may need couple hours) config = tf.ConfigProto() config.gpu_options.allocator_type =‘BFC’ config.gpu_options.per_process_gpu_memory_fraction = 0.90

17reactions
AuroraLHTcommented, Jun 16, 2017

to keep an eye on the GPU usage: sudo watch mvidia-smi

you can twist the source code for expending gpu memory usage: try modify the parameters on the line below in TFFRCNN/lib/fast_rcnn/train.py

config = tf.ConfigProto(allow_soft_placement=True)
config.gpu_options.allocator_type = 'BFC'
config.gpu_options.per_process_gpu_memory_fraction = 0.40

config.gpu_options.allow_growth=True is always my favored option since you don’t need to care about the actual usage.

Read more comments on GitHub >

github_iconTop Results From Across the Web

OOM when allocating tensor with shape - Stack Overflow
I can't see the actual GPU usage of my script since tensorflow always steals all memory at the beginning. And the actual problem...
Read more >
How to solve Error of ResourceExhaustedError in Tensorflow
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. [[Node: ......
Read more >
OOM when allocating tensor with shape[2304,384] Traceback ...
ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[4096] · DeepLearning Ubuntu. 类似问题 https://github.com/CharlesShang/ ...
Read more >
ResourceExhaustedError (see above for traceback): OOM ...
ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[4096]. qing101hua 于 2017-08-14 09:33:14 发布 10002 收藏 1.
Read more >
显存不够----ResourceExhaustedError (see above for ... - 博客园
ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[4096] 类似问题&#
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found