question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Memory requirements

See original GitHub issue

Hello, I am attempting to run this code:

python3 experiment.py --settings_file test

But I am running out of memory (OOM error):

2017-12-09 23:17:18.540786: W tensorflow/core/common_runtime/bfc_allocator.cc:277] ***************************************************************************************************x
2017-12-09 23:17:18.540796: W tensorflow/core/framework/op_kernel.cc:1192] Resource exhausted: OOM when allocating tensor with shape[3988,3988]
Traceback (most recent call last):
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1323, in _do_call
    return fn(*args)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1302, in _run_fn
    status, run_metadata)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/errors_impl.py", line 473, in __exit__
    c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[3988,3988]
	 [[Node: mul_790 = Mul[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"](Neg_102, add_467)]]
	 [[Node: truediv_233/_165 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_216_truediv_233", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "experiment.py", line 221, in <module>
    mmd2, that_np = sess.run(mix_rbf_mmd2_and_ratio(eval_test_real, eval_test_sample,biased=False, sigmas=sigma))
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 889, in run
    run_metadata_ptr)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1120, in _run
    feed_dict_tensor, options, run_metadata)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1317, in _do_run
    options, run_metadata)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1336, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[3988,3988]
	 [[Node: mul_790 = Mul[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"](Neg_102, add_467)]]
	 [[Node: truediv_233/_165 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_216_truediv_233", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

Caused by op 'mul_790', defined at:
  File "experiment.py", line 221, in <module>
    mmd2, that_np = sess.run(mix_rbf_mmd2_and_ratio(eval_test_real, eval_test_sample,biased=False, sigmas=sigma))
  File "/home/jchook/dev/RGAN/mmd.py", line 71, in mix_rbf_mmd2_and_ratio
    K_XX, K_XY, K_YY, d = _mix_rbf_kernel(X, Y, sigmas, wts)
  File "/home/jchook/dev/RGAN/mmd.py", line 52, in _mix_rbf_kernel
    K_YY += wt * tf.exp(-gamma * (-2 * YY + c(Y_sqnorms) + r(Y_sqnorms)))
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/math_ops.py", line 894, in binary_op_wrapper
    return func(x, y, name=name)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/math_ops.py", line 1117, in _mul_dispatch
    return gen_math_ops._mul(x, y, name=name)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/gen_math_ops.py", line 2726, in _mul
    "Mul", x=x, y=y, name=name)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
    op_def=op_def)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/ops.py", line 2956, in create_op
    op_def=op_def)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/ops.py", line 1470, in __init__
    self._traceback = self._graph._extract_stack()  # pylint: disable=protected-access

ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[3988,3988]
	 [[Node: mul_790 = Mul[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"](Neg_102, add_467)]]
	 [[Node: truediv_233/_165 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_216_truediv_233", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

What are the minimum GPU memory requirements?

Issue Analytics

  • State:open
  • Created 6 years ago
  • Comments:13

github_iconTop GitHub Comments

1reaction
corcracommented, Jan 24, 2018

The MMD score is only used for evaluation, so it shouldn’t affect training.

The main way it might affect you is that we use the MMD score (on the validation set) to decide when to save model parameters (https://github.com/ratschlab/RGAN/blob/master/experiment.py#L227), so without it you will default to the normal frequency, which is every 50 epochs (https://github.com/ratschlab/RGAN/blob/master/experiment.py#L273).

1reaction
corcracommented, Jan 22, 2018

You could also vary the size of the set used in evaluation (which gets fed into the MMD calculation), which is set on this line: https://github.com/ratschlab/RGAN/blob/master/experiment.py#L75 batch_multiplier is how many batches worth of data we want to include in the evaluation set.

The problem with reducing the evaluation set size is that it reduces the accuracy of the MMD calculation, but depending on your use case that may be an acceptable price to pay for the code actually running on your hardware. (I’m assuming based on your error log that the OOM is happening due to the MMD calculation, which is quadratic in the number of samples.)

Read more comments on GitHub >

github_iconTop Results From Across the Web

How much RAM do you really need? Assess your memory ...
Memory Requirements ; Desktop, 4 - 8GB + (Windows & Mac), 8 - 32GB + (Windows) ; Notebook, 4 - 8GB + (Windows...
Read more >
Memory Requirement - an overview | ScienceDirect Topics
Memory requirements for the test cases span from a few hundred megabytes to about 25 GB. As the job is distributed over multiple...
Read more >
Memory Requirements - GoldenGate - Oracle Help Center
The amount of memory that is required for Oracle GoldenGate depends on the amount of data being processed, the number of Oracle GoldenGate...
Read more >
Requirements of Memory Management System - GeeksforGeeks
Memory management meant to satisfy some requirements that we should keep in mind. These Requirements of memory management are:
Read more >
Memory and CPU requirements - IBM
The total memory requirement for your applications, plus memory required for each Linux operating system and z/VM itself, give you an estimate of...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found