Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
See original GitHub issueTo get help from the community, we encourage using Stack Overflow and the tensorflow.js
tag.
TensorFlow.js version
{ ‘tfjs-core’: ‘1.0.3’, ‘tfjs-data’: ‘1.0.3’, ‘tfjs-layers’: ‘1.0.3’, ‘tfjs-converter’: ‘1.0.3’, tfjs: ‘1.0.3’, ‘tfjs-node’: ‘1.0.2’ }
Browser version
Running on node Ubuntu 18.04
$ nvidia-smi Fri Mar 29 19:25:37 2019
±----------------------------------------------------------------------------+ | NVIDIA-SMI 410.104 Driver Version: 410.104 CUDA Version: 10.0 | |-------------------------------±---------------------±---------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 GeForce RTX 2070 Off | 00000000:01:00.0 On | N/A | | N/A 46C P8 9W / N/A | 879MiB / 7952MiB | 3% Default | ±------------------------------±---------------------±---------------------+
Describe the problem or feature request
I’m unable to use cudnn convolutional layers in my model on tfjs-node-gpu
Possibly related due to issues with RTX series in this tensorflow workaround there is suggestion to use
config.gpu_options.allow_growth = True
Is there such option in tensorflow js?
Code to reproduce the bug / link to feature request
const tf = require('@tensorflow/tfjs-node-gpu');
const model = tf.sequential({
layers: [
tf.layers.conv2d({
inputShape:[32, 32, 3],
filters: 32,
kernelSize: [3, 3],
activation: 'relu',
}),
tf.layers.maxPooling2d([2, 2]),
],
});
model.predict(tf.randomNormal([4, 32, 32, 3]))
.then((res) => {
res.print();
})
$ node index.js 2019-03-29 19:22:37.112495: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA 2019-03-29 19:22:37.249964: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:998] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2019-03-29 19:22:37.250443: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x3aa4000 executing computations on platform CUDA. Devices: 2019-03-29 19:22:37.250458: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): GeForce RTX 2070, Compute Capability 7.5 2019-03-29 19:22:37.271245: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2208000000 Hz 2019-03-29 19:22:37.271958: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x3aa2750 executing computations on platform Host. Devices: 2019-03-29 19:22:37.271972: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): <undefined>, <undefined> 2019-03-29 19:22:37.272241: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties: name: GeForce RTX 2070 major: 7 minor: 5 memoryClockRate(GHz): 1.44 pciBusID: 0000:01:00.0 totalMemory: 7.77GiB freeMemory: 6.80GiB 2019-03-29 19:22:37.272275: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0 2019-03-29 19:22:37.273295: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-03-29 19:22:37.273308: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0 2019-03-29 19:22:37.273314: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0: N 2019-03-29 19:22:37.273435: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 6612 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2070, pci bus id: 0000:01:00.0, compute capability: 7.5) 2019-03-29 19:22:38.761993: E tensorflow/stream_executor/cuda/cuda_dnn.cc:334] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR 2019-03-29 19:22:38.763178: E tensorflow/stream_executor/cuda/cuda_dnn.cc:334] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR /home/bobi/Desktop/cudnn/node_modules/@tensorflow/tfjs-core/dist/engine.js:132 throw ex; ^
Error: Invalid TF_Status: 2 Message: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above. at NodeJSKernelBackend.executeSingleOutput (/home/bobi/Desktop/cudnn/node_modules/@tensorflow/tfjs-node-gpu/dist/nodejs_kernel_backend.js:192:43) at NodeJSKernelBackend.conv2d (/home/bobi/Desktop/cudnn/node_modules/@tensorflow/tfjs-node-gpu/dist/nodejs_kernel_backend.js:700:21) at environment_1.ENV.engine.runKernel.x (/home/bobi/Desktop/cudnn/node_modules/@tensorflow/tfjs-core/dist/ops/conv.js:152:27) at /home/bobi/Desktop/cudnn/node_modules/@tensorflow/tfjs-core/dist/engine.js:171:26 at Engine.scopedRun (/home/bobi/Desktop/cudnn/node_modules/@tensorflow/tfjs-core/dist/engine.js:126:23) at Engine.runKernel (/home/bobi/Desktop/cudnn/node_modules/@tensorflow/tfjs-core/dist/engine.js:169:14) at conv2d_ (/home/bobi/Desktop/cudnn/node_modules/@tensorflow/tfjs-core/dist/ops/conv.js:151:40) at Object.conv2d (/home/bobi/Desktop/cudnn/node_modules/@tensorflow/tfjs-core/dist/ops/operation.js:46:29) at /home/bobi/Desktop/cudnn/node_modules/@tensorflow/tfjs-layers/dist/layers/convolutional.js:198:17 at /home/bobi/Desktop/cudnn/node_modules/@tensorflow/tfjs-core/dist/engine.js:116:22
Issue Analytics
- State:
- Created 4 years ago
- Reactions:3
- Comments:8 (1 by maintainers)
Top GitHub Comments
As explained in https://github.com/tensorflow/tfjs/issues/671#issuecomment-494832790
There is workaround by setting global variable
export TF_FORCE_GPU_ALLOW_GROWTH=true
@bobiblazeski, I punted over to trying on Windows and finally just got this working. I had to drop down to tfjs-node-gpu version 0.3.2 due to node-gyp issues.
However, once I finally got it to install, I later ran in to this same CuDNN issue! Fortunately, using CUDA 9.0 (needed for 0.3.2 compatibility) I got a better error message before the “This is probably because cuDNN failed to initialize…” message, stating that tfjs-node-gpu was built against CuDNN version 7.2. Once I downloaded that version, everything is working.
I haven’t went back to see if I could get it to work on the LINUX install, but I’m hoping that this could just be a CuDNN version incompatibility issue that you could experiment with. Luckily CuDNN doesn’t have an install / uninstall process; it’s simply copying the extracted files in to a dedicated directory that you include in your system path.
I hope that helps give you some possible direction!