Getting Out-of-memory (OOM) error on running the model on GPU.
See original GitHub issueOn running the same inference command given in the readme.md, I am getting the following OOM error. I am running it on Intel Core i5 7th gen CPU with 8GB RAM and NVidia 940MX 4GB GPU, Keras 1.2 and Theano 0.9.0.
THEANO_FLAGS=optimizer=fast_compile,device=gpu python main.py --mode inference --config sessions/001/config.json --noisy_input_path data/NSDTSEA/noisy_testset_wav --clean_input_path data/NSDTSEA/clean_testset_wav
Using TensorFlow backend.
/usr/local/lib/python2.7/dist-packages/h5py/init.py:34: FutureWarning: Conversion of the second argument of issubdtype from float
to np.floating
is deprecated. In future, it will be treated as np.float64 == np.dtype(float).type
.
from ._conv import register_converters as _register_converters
Loading model from epoch: 144
2018-02-18 17:40:19.280369: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
2018-02-18 17:40:19.486539: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:892] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2018-02-18 17:40:19.486944: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Found device 0 with properties:
name: GeForce 940MX major: 5 minor: 0 memoryClockRate(GHz): 1.2415
pciBusID: 0000:01:00.0
totalMemory: 3.95GiB freeMemory: 3.67GiB
2018-02-18 17:40:19.486961: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1120] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: GeForce 940MX, pci bus id: 0000:01:00.0, compute capability: 5.0)
Performing inference…
Denoising: p232_001.wav
0%| | 0/2 [00:00<?, ?it/s]
2018-02-18 17:40:23.358141: W tensorflow/core/common_runtime/bfc_allocator.cc:217] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.01GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available.
2018-02-18 17:40:33.358618: W tensorflow/core/common_runtime/bfc_allocator.cc:273] Allocator (GPU_0_bfc) ran out of memory trying to allocate 696.00MiB. Current allocation summary follows.
.
.
Stats:
Limit: 3605921792
InUse: 3542674176
MaxInUse: 3542674176
NumAllocs: 973
MaxAllocSize: 464153344
.
.
ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[192,128,2770,1]
Allocator (GPU_0_bfc) ran out of memory trying to allocate 389.18MiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available.
@drethage How to solve this error? If you need anymore information please let me know.
Issue Analytics
- State:
- Created 6 years ago
- Comments:5
Top GitHub Comments
Got the trick to solve this error. Actually the model size gets too big for the limited GPU memory to accommodate the model with the default parameters in the config.json file. To reduce the model size, you just need to tweak one parameter that is “dilations” in “model” dictionary from default value of 9 to something small like 4 or 5. The config.json file to modify is the one present in the parent directory itself and NOT the one in sessions/001/config.json. Mind you that the given pretrained model won’t work after changing ‘dilations’ value for inference. So you will have to first train a new model using the training dataset given and then do the inference. @wuweijia1994
@jordipons I am unable to perform inference on the machine with
Nvidia Tesla T4
16gb, I am getting out of memory error