question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

How to reduce GPU memory?

See original GitHub issue

What a wonderful project! I have used it to solve some problems. But there is one problem that always bothers me.

In one of the cases, I have to use rnn_size=512, num_layers=2, seq_length=1200. Other arguments: batch_size=10, num_epochs=50, grad_clip=5.0, and so on. But it will allocate 7.23GiB in GPU, which is only 8GB-free. So I just wonder if I can reduce GPU memory to 7GiB or less. If so, I can run it on GPU. rnn_size, num_layers, seq_length cannot be modified.

Here is some of the ouputs.

I tensorflow/core/common_runtime/bfc_allocator.cc:689] Summary of in-use Chunks by size: I tensorflow/core/common_runtime/bfc_allocator.cc:692] 22 Chunks of size 256 totalling 5.5KiB I tensorflow/core/common_runtime/bfc_allocator.cc:692] 5 Chunks of size 512 totalling 2.5KiB I tensorflow/core/common_runtime/bfc_allocator.cc:692] 1 Chunks of size 1280 totalling 1.2KiB I tensorflow/core/common_runtime/bfc_allocator.cc:692] 7499 Chunks of size 2048 totalling 14.65MiB I tensorflow/core/common_runtime/bfc_allocator.cc:692] 1087 Chunks of size 4096 totalling 4.25MiB I tensorflow/core/common_runtime/bfc_allocator.cc:692] 1 Chunks of size 4608 totalling 4.5KiB I tensorflow/core/common_runtime/bfc_allocator.cc:692] 1 Chunks of size 6144 totalling 6.0KiB I tensorflow/core/common_runtime/bfc_allocator.cc:692] 616 Chunks of size 8192 totalling 4.81MiB I tensorflow/core/common_runtime/bfc_allocator.cc:692] 1 Chunks of size 9984 totalling 9.8KiB I tensorflow/core/common_runtime/bfc_allocator.cc:692] 4 Chunks of size 10240 totalling 40.0KiB I tensorflow/core/common_runtime/bfc_allocator.cc:692] 2 Chunks of size 12288 totalling 24.0KiB I tensorflow/core/common_runtime/bfc_allocator.cc:692] 303 Chunks of size 14336 totalling 4.14MiB I tensorflow/core/common_runtime/bfc_allocator.cc:692] 5 Chunks of size 198656 totalling 970.0KiB I tensorflow/core/common_runtime/bfc_allocator.cc:692] 1 Chunks of size 208384 totalling 203.5KiB I tensorflow/core/common_runtime/bfc_allocator.cc:692] 919 Chunks of size 8388608 totalling 7.18GiB I tensorflow/core/common_runtime/bfc_allocator.cc:692] 1 Chunks of size 10775552 totalling 10.28MiB I tensorflow/core/common_runtime/bfc_allocator.cc:692] 1 Chunks of size 14428160 totalling 13.76MiB I tensorflow/core/common_runtime/bfc_allocator.cc:696] Sum Total of in-use chunks: 7.23GiB I tensorflow/core/common_runtime/bfc_allocator.cc:698] Stats: Limit: 7967745639 InUse: 7764832256 MaxInUse: 7764842496 NumAllocs: 60834 MaxAllocSize: 14428160

W tensorflow/core/common_runtime/bfc_allocator.cc:270] **************************************************************************************************** W tensorflow/core/common_runtime/bfc_allocator.cc:271] Ran out of memory trying to allocate 8.00MiB. See logs for memory state. W tensorflow/core/framework/op_kernel.cc:968] Resource exhausted: OOM when allocating tensor with shape[1024,2048] E tensorflow/stream_executor/cuda/cuda_driver.cc:965] failed to allocate 8.00G (8589934592 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY E tensorflow/stream_executor/cuda/cuda_driver.cc:965] failed to allocate 8.00G (8589934592 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY

Sorry for my poor English, and thanks a lot!

Issue Analytics

  • State:open
  • Created 7 years ago
  • Comments:5

github_iconTop GitHub Comments

6reactions
fujimotomhcommented, Oct 31, 2016

@ckcz123 You almost have it. dynamic_rnn takes the input as a tensor and not a list. This works on my laptop with a seq_length of 1200.

outputs, last_state = tf.nn.dynamic_rnn(cell, tf.nn.embedding_lookup(embedding, self.input_data), initial_state=self.initial_state, scope='rnnlm')

To confirm correctness, I think the best thing to do would be to run it with default parameters and see if you can get low loss on the training set. I would suspect this would work though as rnn_decoder and dynamic_rnn claim have the same function.

0reactions
ckcz123commented, Nov 1, 2016

@fujimotomh Oh, it works! Only 1.1G usage of GPU memory! Thanks for your advice!

Read more comments on GitHub >

github_iconTop Results From Across the Web

7 Tested Methods to Fix Your GPU Memory is Full Message
1. Adjust paging file settings for the game drive · 2. Update the graphics driver · 3. Use the 3GB switch · 4....
Read more >
Can I reduce my gpu's memory usage? - Tom's Hardware Forum
My gpu' memory usage is always maxim for no reason even when I'm not doing anything.Is there a way I can reduce the...
Read more >
How can I reduce GPU memory usage? #7969 - GitHub
CUDA Out of Memory Solutions · Reduce --batch-size · Reduce --img-size · Reduce model size, i.e. from YOLOv5x -> YOLOv5l -> YOLOv5m ->...
Read more >
Any idea how to reduce GPU Memory Usage or increase ...
Go to the video tab and change your graphics card "setting" . Mine had some generic graphics card, but in the drop down...
Read more >
Memory Usage Optimizations for GPU rendering
But if that is not an acceptable option you may use the following techniques to reduce the amount of VRAM used and eventually...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found