Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

How to reduce GPU memory?

See original GitHub issue

What a wonderful project! I have used it to solve some problems. But there is one problem that always bothers me.

In one of the cases, I have to use rnn_size=512, num_layers=2, seq_length=1200. Other arguments: batch_size=10, num_epochs=50, grad_clip=5.0, and so on. But it will allocate 7.23GiB in GPU, which is only 8GB-free. So I just wonder if I can reduce GPU memory to 7GiB or less. If so, I can run it on GPU. rnn_size, num_layers, seq_length cannot be modified.

Here is some of the ouputs.

I tensorflow/core/common_runtime/bfc_allocator.cc:689] Summary of in-use Chunks by size: I tensorflow/core/common_runtime/bfc_allocator.cc:692] 22 Chunks of size 256 totalling 5.5KiB I tensorflow/core/common_runtime/bfc_allocator.cc:692] 5 Chunks of size 512 totalling 2.5KiB I tensorflow/core/common_runtime/bfc_allocator.cc:692] 1 Chunks of size 1280 totalling 1.2KiB I tensorflow/core/common_runtime/bfc_allocator.cc:692] 7499 Chunks of size 2048 totalling 14.65MiB I tensorflow/core/common_runtime/bfc_allocator.cc:692] 1087 Chunks of size 4096 totalling 4.25MiB I tensorflow/core/common_runtime/bfc_allocator.cc:692] 1 Chunks of size 4608 totalling 4.5KiB I tensorflow/core/common_runtime/bfc_allocator.cc:692] 1 Chunks of size 6144 totalling 6.0KiB I tensorflow/core/common_runtime/bfc_allocator.cc:692] 616 Chunks of size 8192 totalling 4.81MiB I tensorflow/core/common_runtime/bfc_allocator.cc:692] 1 Chunks of size 9984 totalling 9.8KiB I tensorflow/core/common_runtime/bfc_allocator.cc:692] 4 Chunks of size 10240 totalling 40.0KiB I tensorflow/core/common_runtime/bfc_allocator.cc:692] 2 Chunks of size 12288 totalling 24.0KiB I tensorflow/core/common_runtime/bfc_allocator.cc:692] 303 Chunks of size 14336 totalling 4.14MiB I tensorflow/core/common_runtime/bfc_allocator.cc:692] 5 Chunks of size 198656 totalling 970.0KiB I tensorflow/core/common_runtime/bfc_allocator.cc:692] 1 Chunks of size 208384 totalling 203.5KiB I tensorflow/core/common_runtime/bfc_allocator.cc:692] 919 Chunks of size 8388608 totalling 7.18GiB I tensorflow/core/common_runtime/bfc_allocator.cc:692] 1 Chunks of size 10775552 totalling 10.28MiB I tensorflow/core/common_runtime/bfc_allocator.cc:692] 1 Chunks of size 14428160 totalling 13.76MiB I tensorflow/core/common_runtime/bfc_allocator.cc:696] Sum Total of in-use chunks: 7.23GiB I tensorflow/core/common_runtime/bfc_allocator.cc:698] Stats: Limit: 7967745639 InUse: 7764832256 MaxInUse: 7764842496 NumAllocs: 60834 MaxAllocSize: 14428160

W tensorflow/core/common_runtime/bfc_allocator.cc:270] **************************************************************************************************** W tensorflow/core/common_runtime/bfc_allocator.cc:271] Ran out of memory trying to allocate 8.00MiB. See logs for memory state. W tensorflow/core/framework/op_kernel.cc:968] Resource exhausted: OOM when allocating tensor with shape[1024,2048] E tensorflow/stream_executor/cuda/cuda_driver.cc:965] failed to allocate 8.00G (8589934592 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY E tensorflow/stream_executor/cuda/cuda_driver.cc:965] failed to allocate 8.00G (8589934592 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY

Sorry for my poor English, and thanks a lot!

Issue Analytics

State:
Created 7 years ago
Comments:5

Top GitHub Comments

6reactions

fujimotomhcommented, Oct 31, 2016

@ckcz123 You almost have it. dynamic_rnn takes the input as a tensor and not a list. This works on my laptop with a seq_length of 1200.

outputs, last_state = tf.nn.dynamic_rnn(cell, tf.nn.embedding_lookup(embedding, self.input_data), initial_state=self.initial_state, scope='rnnlm')

To confirm correctness, I think the best thing to do would be to run it with default parameters and see if you can get low loss on the training set. I would suspect this would work though as rnn_decoder and dynamic_rnn claim have the same function.

0reactions

ckcz123commented, Nov 1, 2016

@fujimotomh Oh, it works! Only 1.1G usage of GPU memory! Thanks for your advice!