Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[4096]

See original GitHub issue

I am trying to perform faster rcnn on a custom dataset based on pascal_VOC. But I get this error when I start to train:

Stats: Limit: 1696386252 InUse: 1685909760 MaxInUse: 1696386048 NumAllocs: 152 MaxAllocSize: 533417472

2017-06-03 04:51:48.992694: W tensorflow/core/common_runtime/bfc_allocator.cc:277] *********************************************************************************************xxxxxxx 2017-06-03 04:51:48.992751: W tensorflow/core/framework/op_kernel.cc:1152] Resource exhausted: OOM when allocating tensor with shape[25088,4096] Traceback (most recent call last): File “./faster_rcnn/train_net.py”, line 109, in <module> restore=bool(int(args.restore))) File “./faster_rcnn/…/lib/fast_rcnn/train.py”, line 400, in train_net sw.train_model(sess, max_iters, restore=restore) File “./faster_rcnn/…/lib/fast_rcnn/train.py”, line 148, in train_model sess.run(tf.global_variables_initializer()) File “/home/hadi/.local/lib/python2.7/site-packages/tensorflow/python/client/session.py”, line 778, in run run_metadata_ptr) File “/home/hadi/.local/lib/python2.7/site-packages/tensorflow/python/client/session.py”, line 982, in _run feed_dict_string, options, run_metadata) File “/home/hadi/.local/lib/python2.7/site-packages/tensorflow/python/client/session.py”, line 1032, in _do_run target_list, options, run_metadata) File “/home/hadi/.local/lib/python2.7/site-packages/tensorflow/python/client/session.py”, line 1052, in _do_call raise type(e)(node_def, op, message) tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[4096] [[Node: fc6/biases/Momentum/Assign = Assign[T=DT_FLOAT, _class=[“loc:@fc6/biases”], use_locking=true, validate_shape=true, _device=“/job:localhost/replica:0/task:0/gpu:0”](fc6/biases/Momentum, fc6/biases/Momentum/Initializer/Const)]]

Caused by op u’fc6/biases/Momentum/Assign’, defined at: File “./faster_rcnn/train_net.py”, line 109, in <module> restore=bool(int(args.restore))) File “./faster_rcnn/…/lib/fast_rcnn/train.py”, line 400, in train_net sw.train_model(sess, max_iters, restore=restore) File “./faster_rcnn/…/lib/fast_rcnn/train.py”, line 143, in train_model train_op = opt.apply_gradients(zip(grads, tvars), global_step=global_step) File “/home/hadi/.local/lib/python2.7/site-packages/tensorflow/python/training/optimizer.py”, line 446, in apply_gradients self._create_slots([_get_variable_for(v) for v in var_list]) File “/home/hadi/.local/lib/python2.7/site-packages/tensorflow/python/training/momentum.py”, line 63, in _create_slots self._zeros_slot(v, “momentum”, self._name) File “/home/hadi/.local/lib/python2.7/site-packages/tensorflow/python/training/optimizer.py”, line 766, in _zeros_slot named_slots[_var_key(var)] = slot_creator.create_zeros_slot(var, op_name) File “/home/hadi/.local/lib/python2.7/site-packages/tensorflow/python/training/slot_creator.py”, line 174, in create_zeros_slot colocate_with_primary=colocate_with_primary) File “/home/hadi/.local/lib/python2.7/site-packages/tensorflow/python/training/slot_creator.py”, line 146, in create_slot_with_initializer dtype) File “/home/hadi/.local/lib/python2.7/site-packages/tensorflow/python/training/slot_creator.py”, line 66, in _create_slot_var validate_shape=validate_shape) File “/home/hadi/.local/lib/python2.7/site-packages/tensorflow/python/ops/variable_scope.py”, line 1049, in get_variable use_resource=use_resource, custom_getter=custom_getter) File “/home/hadi/.local/lib/python2.7/site-packages/tensorflow/python/ops/variable_scope.py”, line 948, in get_variable use_resource=use_resource, custom_getter=custom_getter) File “/home/hadi/.local/lib/python2.7/site-packages/tensorflow/python/ops/variable_scope.py”, line 356, in get_variable validate_shape=validate_shape, use_resource=use_resource) File “/home/hadi/.local/lib/python2.7/site-packages/tensorflow/python/ops/variable_scope.py”, line 341, in _true_getter use_resource=use_resource) File “/home/hadi/.local/lib/python2.7/site-packages/tensorflow/python/ops/variable_scope.py”, line 714, in _get_single_variable validate_shape=validate_shape) File “/home/hadi/.local/lib/python2.7/site-packages/tensorflow/python/ops/variables.py”, line 197, in init expected_shape=expected_shape) File “/home/hadi/.local/lib/python2.7/site-packages/tensorflow/python/ops/variables.py”, line 306, in _init_from_args validate_shape=validate_shape).op File “/home/hadi/.local/lib/python2.7/site-packages/tensorflow/python/ops/state_ops.py”, line 270, in assign validate_shape=validate_shape) File “/home/hadi/.local/lib/python2.7/site-packages/tensorflow/python/ops/gen_state_ops.py”, line 47, in assign use_locking=use_locking, name=name) File “/home/hadi/.local/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py”, line 768, in apply_op op_def=op_def) File “/home/hadi/.local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py”, line 2336, in create_op original_op=self._default_original_op, op_def=op_def) File “/home/hadi/.local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py”, line 1228, in init self._traceback = _extract_stack()

ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[4096] [[Node: fc6/biases/Momentum/Assign = Assign[T=DT_FLOAT, _class=[“loc:@fc6/biases”], use_locking=true, validate_shape=true, _device=“/job:localhost/replica:0/task:0/gpu:0”](fc6/biases/Momentum, fc6/biases/Momentum/Initializer/Const)]]`

How can I make this work? I don’t know how to reduce batch size and see if that helps.

Issue Analytics

State:
Created 6 years ago
Comments:23

Top GitHub Comments

21reactions

LMdeLiangMicommented, Jul 2, 2017

Fix! Using beneth setting! Start train now! Iter 450/70000(may need couple hours) config = tf.ConfigProto() config.gpu_options.allocator_type =‘BFC’ config.gpu_options.per_process_gpu_memory_fraction = 0.90

17reactions

AuroraLHTcommented, Jun 16, 2017

to keep an eye on the GPU usage: sudo watch mvidia-smi

you can twist the source code for expending gpu memory usage: try modify the parameters on the line below in TFFRCNN/lib/fast_rcnn/train.py

config = tf.ConfigProto(allow_soft_placement=True)
config.gpu_options.allocator_type = 'BFC'
config.gpu_options.per_process_gpu_memory_fraction = 0.40

config.gpu_options.allow_growth=True is always my favored option since you don’t need to care about the actual usage.

Top Results From Across the Web

OOM when allocating tensor with shape - Stack Overflow

I can't see the actual GPU usage of my script since tensorflow always steals all memory at the beginning. And the actual problem...

How to solve Error of ResourceExhaustedError in Tensorflow

Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. [[Node: ......

OOM when allocating tensor with shape[2304,384] Traceback ...

ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[4096] · DeepLearning Ubuntu. 类似问题 https://github.com/CharlesShang/ ...

ResourceExhaustedError (see above for traceback): OOM ...

ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[4096]. qing101hua 于 2017-08-14 09:33:14 发布 10002 收藏 1.

显存不够----ResourceExhaustedError (see above for ... - 博客园

ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[4096] 类似问题&#