question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Getting Out-of-memory (OOM) error on running the model on GPU.

See original GitHub issue

On running the same inference command given in the readme.md, I am getting the following OOM error. I am running it on Intel Core i5 7th gen CPU with 8GB RAM and NVidia 940MX 4GB GPU, Keras 1.2 and Theano 0.9.0.

THEANO_FLAGS=optimizer=fast_compile,device=gpu python main.py --mode inference --config sessions/001/config.json --noisy_input_path data/NSDTSEA/noisy_testset_wav --clean_input_path data/NSDTSEA/clean_testset_wav

Using TensorFlow backend. /usr/local/lib/python2.7/dist-packages/h5py/init.py:34: FutureWarning: Conversion of the second argument of issubdtype from float to np.floating is deprecated. In future, it will be treated as np.float64 == np.dtype(float).type. from ._conv import register_converters as _register_converters Loading model from epoch: 144 2018-02-18 17:40:19.280369: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA 2018-02-18 17:40:19.486539: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:892] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2018-02-18 17:40:19.486944: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Found device 0 with properties: name: GeForce 940MX major: 5 minor: 0 memoryClockRate(GHz): 1.2415 pciBusID: 0000:01:00.0 totalMemory: 3.95GiB freeMemory: 3.67GiB 2018-02-18 17:40:19.486961: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1120] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: GeForce 940MX, pci bus id: 0000:01:00.0, compute capability: 5.0) Performing inference… Denoising: p232_001.wav 0%| | 0/2 [00:00<?, ?it/s] 2018-02-18 17:40:23.358141: W tensorflow/core/common_runtime/bfc_allocator.cc:217] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.01GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available. 2018-02-18 17:40:33.358618: W tensorflow/core/common_runtime/bfc_allocator.cc:273] Allocator (GPU_0_bfc) ran out of memory trying to allocate 696.00MiB. Current allocation summary follows. . . Stats: Limit: 3605921792 InUse: 3542674176 MaxInUse: 3542674176 NumAllocs: 973 MaxAllocSize: 464153344 . . ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[192,128,2770,1] Allocator (GPU_0_bfc) ran out of memory trying to allocate 389.18MiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available.

@drethage How to solve this error? If you need anymore information please let me know.

Issue Analytics

  • State:closed
  • Created 6 years ago
  • Comments:5

github_iconTop GitHub Comments

2reactions
DillipKScommented, Feb 19, 2018

Got the trick to solve this error. Actually the model size gets too big for the limited GPU memory to accommodate the model with the default parameters in the config.json file. To reduce the model size, you just need to tweak one parameter that is “dilations” in “model” dictionary from default value of 9 to something small like 4 or 5. The config.json file to modify is the one present in the parent directory itself and NOT the one in sessions/001/config.json. Mind you that the given pretrained model won’t work after changing ‘dilations’ value for inference. So you will have to first train a new model using the training dataset given and then do the inference. @wuweijia1994

0reactions
atinesh-scommented, Jun 16, 2020

@jordipons I am unable to perform inference on the machine with Nvidia Tesla T4 16gb, I am getting out of memory error

Read more comments on GitHub >

github_iconTop Results From Across the Web

Out of Memory (OOM) when repeatedly running large models
Any advice for freeing up GPU memory after training a large model (e.g., roberta-large)?
Read more >
CUDA Out Of Memory (OOM) error while using GPU?
The model is training perfectly when using my 12 CPU cores, but when assigned to my NVIDIA GTX card, an OOM error stating...
Read more >
Resolving CUDA Being Out of Memory With Gradient ...
The issue is, to train the model using GPU, you need the error between the labels and predictions, and for the error, you...
Read more >
Why am I getting GPU ran out of memory error here?
Usually, when OOM errors take place, it is because the batch_size is too big or your VRAM is too small. In your case,...
Read more >
Solving "CUDA out of memory" Error - Kaggle
I was facing the same issue then wrapping the gpu intensive code(like model training, processing images) with with torch.no_grad(): block seems to fixed...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found