Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Getting Out-of-memory (OOM) error on running the model on GPU.

See original GitHub issue

On running the same inference command given in the readme.md, I am getting the following OOM error. I am running it on Intel Core i5 7th gen CPU with 8GB RAM and NVidia 940MX 4GB GPU, Keras 1.2 and Theano 0.9.0.

THEANO_FLAGS=optimizer=fast_compile,device=gpu python main.py --mode inference --config sessions/001/config.json --noisy_input_path data/NSDTSEA/noisy_testset_wav --clean_input_path data/NSDTSEA/clean_testset_wav

Using TensorFlow backend. /usr/local/lib/python2.7/dist-packages/h5py/init.py:34: FutureWarning: Conversion of the second argument of issubdtype from float to np.floating is deprecated. In future, it will be treated as np.float64 == np.dtype(float).type. from ._conv import register_converters as _register_converters Loading model from epoch: 144 2018-02-18 17:40:19.280369: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA 2018-02-18 17:40:19.486539: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:892] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2018-02-18 17:40:19.486944: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Found device 0 with properties: name: GeForce 940MX major: 5 minor: 0 memoryClockRate(GHz): 1.2415 pciBusID: 0000:01:00.0 totalMemory: 3.95GiB freeMemory: 3.67GiB 2018-02-18 17:40:19.486961: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1120] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: GeForce 940MX, pci bus id: 0000:01:00.0, compute capability: 5.0) Performing inference… Denoising: p232_001.wav 0%| | 0/2 [00:00<?, ?it/s] 2018-02-18 17:40:23.358141: W tensorflow/core/common_runtime/bfc_allocator.cc:217] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.01GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available. 2018-02-18 17:40:33.358618: W tensorflow/core/common_runtime/bfc_allocator.cc:273] Allocator (GPU_0_bfc) ran out of memory trying to allocate 696.00MiB. Current allocation summary follows. . . Stats: Limit: 3605921792 InUse: 3542674176 MaxInUse: 3542674176 NumAllocs: 973 MaxAllocSize: 464153344 . . ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[192,128,2770,1] Allocator (GPU_0_bfc) ran out of memory trying to allocate 389.18MiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available.

@drethage How to solve this error? If you need anymore information please let me know.

Issue Analytics

State:
Created 6 years ago
Comments:5

Top GitHub Comments

2reactions

DillipKScommented, Feb 19, 2018

Got the trick to solve this error. Actually the model size gets too big for the limited GPU memory to accommodate the model with the default parameters in the config.json file. To reduce the model size, you just need to tweak one parameter that is “dilations” in “model” dictionary from default value of 9 to something small like 4 or 5. The config.json file to modify is the one present in the parent directory itself and NOT the one in sessions/001/config.json. Mind you that the given pretrained model won’t work after changing ‘dilations’ value for inference. So you will have to first train a new model using the training dataset given and then do the inference. @wuweijia1994

0reactions

atinesh-scommented, Jun 16, 2020

@jordipons I am unable to perform inference on the machine with Nvidia Tesla T4 16gb, I am getting out of memory error

Top Results From Across the Web

Out of Memory (OOM) when repeatedly running large models

Any advice for freeing up GPU memory after training a large model (e.g., roberta-large)?

CUDA Out Of Memory (OOM) error while using GPU?

The model is training perfectly when using my 12 CPU cores, but when assigned to my NVIDIA GTX card, an OOM error stating...

Resolving CUDA Being Out of Memory With Gradient ...

The issue is, to train the model using GPU, you need the error between the labels and predictions, and for the error, you...

Why am I getting GPU ran out of memory error here?

Usually, when OOM errors take place, it is because the batch_size is too big or your VRAM is too small. In your case,...

Solving "CUDA out of memory" Error - Kaggle

I was facing the same issue then wrapping the gpu intensive code(like model training, processing images) with with torch.no_grad(): block seems to fixed...