How to reduce GPU memory?
See original GitHub issueWhat a wonderful project! I have used it to solve some problems. But there is one problem that always bothers me.
In one of the cases, I have to use rnn_size=512
, num_layers=2
, seq_length=1200
.
Other arguments: batch_size=10
, num_epochs=50
, grad_clip=5.0
, and so on.
But it will allocate 7.23GiB in GPU, which is only 8GB-free.
So I just wonder if I can reduce GPU memory to 7GiB or less. If so, I can run it on GPU.
rnn_size
, num_layers
, seq_length
cannot be modified.
Here is some of the ouputs.
I tensorflow/core/common_runtime/bfc_allocator.cc:689] Summary of in-use Chunks by size: I tensorflow/core/common_runtime/bfc_allocator.cc:692] 22 Chunks of size 256 totalling 5.5KiB I tensorflow/core/common_runtime/bfc_allocator.cc:692] 5 Chunks of size 512 totalling 2.5KiB I tensorflow/core/common_runtime/bfc_allocator.cc:692] 1 Chunks of size 1280 totalling 1.2KiB I tensorflow/core/common_runtime/bfc_allocator.cc:692] 7499 Chunks of size 2048 totalling 14.65MiB I tensorflow/core/common_runtime/bfc_allocator.cc:692] 1087 Chunks of size 4096 totalling 4.25MiB I tensorflow/core/common_runtime/bfc_allocator.cc:692] 1 Chunks of size 4608 totalling 4.5KiB I tensorflow/core/common_runtime/bfc_allocator.cc:692] 1 Chunks of size 6144 totalling 6.0KiB I tensorflow/core/common_runtime/bfc_allocator.cc:692] 616 Chunks of size 8192 totalling 4.81MiB I tensorflow/core/common_runtime/bfc_allocator.cc:692] 1 Chunks of size 9984 totalling 9.8KiB I tensorflow/core/common_runtime/bfc_allocator.cc:692] 4 Chunks of size 10240 totalling 40.0KiB I tensorflow/core/common_runtime/bfc_allocator.cc:692] 2 Chunks of size 12288 totalling 24.0KiB I tensorflow/core/common_runtime/bfc_allocator.cc:692] 303 Chunks of size 14336 totalling 4.14MiB I tensorflow/core/common_runtime/bfc_allocator.cc:692] 5 Chunks of size 198656 totalling 970.0KiB I tensorflow/core/common_runtime/bfc_allocator.cc:692] 1 Chunks of size 208384 totalling 203.5KiB I tensorflow/core/common_runtime/bfc_allocator.cc:692] 919 Chunks of size 8388608 totalling 7.18GiB I tensorflow/core/common_runtime/bfc_allocator.cc:692] 1 Chunks of size 10775552 totalling 10.28MiB I tensorflow/core/common_runtime/bfc_allocator.cc:692] 1 Chunks of size 14428160 totalling 13.76MiB I tensorflow/core/common_runtime/bfc_allocator.cc:696] Sum Total of in-use chunks: 7.23GiB I tensorflow/core/common_runtime/bfc_allocator.cc:698] Stats: Limit: 7967745639 InUse: 7764832256 MaxInUse: 7764842496 NumAllocs: 60834 MaxAllocSize: 14428160
W tensorflow/core/common_runtime/bfc_allocator.cc:270] **************************************************************************************************** W tensorflow/core/common_runtime/bfc_allocator.cc:271] Ran out of memory trying to allocate 8.00MiB. See logs for memory state. W tensorflow/core/framework/op_kernel.cc:968] Resource exhausted: OOM when allocating tensor with shape[1024,2048] E tensorflow/stream_executor/cuda/cuda_driver.cc:965] failed to allocate 8.00G (8589934592 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY E tensorflow/stream_executor/cuda/cuda_driver.cc:965] failed to allocate 8.00G (8589934592 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
Sorry for my poor English, and thanks a lot!
Issue Analytics
- State:
- Created 7 years ago
- Comments:5
Top GitHub Comments
@ckcz123 You almost have it.
dynamic_rnn
takes the input as a tensor and not a list. This works on my laptop with a seq_length of 1200.To confirm correctness, I think the best thing to do would be to run it with default parameters and see if you can get low loss on the training set. I would suspect this would work though as
rnn_decoder
anddynamic_rnn
claim have the same function.@fujimotomh Oh, it works! Only 1.1G usage of GPU memory! Thanks for your advice!