question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Allocator (GPU_0_bfc) ran out of memory

See original GitHub issue

Hello, I’m very sorry that I encountered some problems. Allocator (GPU_0_bfc) ran out of memory trying to allocate 6.36GiB. Current allocation summary follows. ........ ................. Limit: 10508668109 InUse: 8296583424 MaxInUse: 9055883776 NumAllocs: 1190 MaxAllocSize: 6826283264 This may be insufficient memory(GPU) situation: I slightly modified your code so that it uses my own data ,several inputs have been added,and subsampling_parameter=0 batch_num=1, The above error occurs when running here (trainer.py) _, L_out, L_reg, L_p, probs, labels, acc = self.sess.run(ops, {model.dropout_prob: 0.5}) My computer : RTX2080TI, Video Memory:11g. CPU:intel i9 ,Memory:16g TensorFlow 1.12.0, CUDA 9.0 and cuDNN 7.4 I don’t want to buy any more graphics cards or reduce the depth of network Can you give me some suggestions to reduce the GPU memory usage of the program? thank you!!

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:5 (3 by maintainers)

github_iconTop GitHub Comments

2reactions
HuguesTHOMAScommented, Jul 8, 2020

Hi @vvaibhav08,

You are right that this parameter is the last one that could have an influence on the memory consumption after batch size and first-sub-sampling / input-radius. In my own experiments, I found that a keep_ratio=0.8 is very effective even on extremely uneven dataset like Semantic3D. You could try to lower it even further, but I don’t think this would help a lot more that it does already. The reason is simple: you can see the distribution of the neighborhood sizes in your dataset and it will usually look like the right side of a Gaussian. Something with a shape like this:

image

The 20% largest neighborhoods are the area in red and the new n_max will be nearly divided by two. The next 20% biggest neighborhood are in green and as you can see the new value of n_max if you take the 60th percentile is very close to the 80th percentile. This is why in my opinion you wont gain much by lowering the keep_ratio parameters.

Anyway I have another question for you that might solve you problem. I understand you are using you own data. Are you subsampling it before feeding it to the network? Because the first subsampling ratio is not applied automatically. The first layer of the network assumes that the data was already subsampled before. So if you don’t subsample your data, that could explain you OOM errors. This is the job of the function load_subsampled_clouds that I have in all my datasets. If you are curious, I also gave a link to my SemanticKitti implementation where I do this subsampling online on each input cloud before feeding it to the input generator.

I hope this helps.

Best, Hugues

Read more comments on GitHub >

github_iconTop Results From Across the Web

python - Allocator (GPU_0_bfc) ran out of memory trying to ...
The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available....
Read more >
Allocator (GPU_0_bfc) ran out of memory trying to allocate ...
So I came across this problem when I try to run a TF-TRT optimized model in tensorflow 2.3. Model architecture is mobilenet-v2-fpnlite. The ......
Read more >
Allocator (GPU_0_bfc) ran out of memory trying to ... - GitHub
I trained the model in a distributed environment ,run two containers with overlay network in two services machines. Every service has a Tesla...
Read more >
failed to allocate memory for convolution redzone checking
If you've run out of RAM, you'll need to restart the process (which should free up ... Allocator (GPU0bfc) ran out of memory...
Read more >
"Out of memory" notification, when working in AutoCAD
This error occurs because the computer ran out of usable memory before it was ... You can get out of memory errors while...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found