question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR

See original GitHub issue

To get help from the community, we encourage using Stack Overflow and the tensorflow.js tag.

TensorFlow.js version

{ ‘tfjs-core’: ‘1.0.3’, ‘tfjs-data’: ‘1.0.3’, ‘tfjs-layers’: ‘1.0.3’, ‘tfjs-converter’: ‘1.0.3’, tfjs: ‘1.0.3’, ‘tfjs-node’: ‘1.0.2’ }

Browser version

Running on node Ubuntu 18.04

$ nvidia-smi Fri Mar 29 19:25:37 2019
±----------------------------------------------------------------------------+ | NVIDIA-SMI 410.104 Driver Version: 410.104 CUDA Version: 10.0 | |-------------------------------±---------------------±---------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 GeForce RTX 2070 Off | 00000000:01:00.0 On | N/A | | N/A 46C P8 9W / N/A | 879MiB / 7952MiB | 3% Default | ±------------------------------±---------------------±---------------------+

Describe the problem or feature request

I’m unable to use cudnn convolutional layers in my model on tfjs-node-gpu Possibly related due to issues with RTX series in this tensorflow workaround there is suggestion to use config.gpu_options.allow_growth = True

Is there such option in tensorflow js?

Code to reproduce the bug / link to feature request

const tf = require('@tensorflow/tfjs-node-gpu');
const model =  tf.sequential({
    layers: [      
      tf.layers.conv2d({
        inputShape:[32, 32, 3],
        filters: 32, 
        kernelSize: [3, 3],
        activation: 'relu',
      }),
      tf.layers.maxPooling2d([2, 2]),      
    ],
  });
model.predict(tf.randomNormal([4, 32, 32, 3]))
     .then((res) => {
         res.print();
     })

$ node index.js 2019-03-29 19:22:37.112495: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA 2019-03-29 19:22:37.249964: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:998] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2019-03-29 19:22:37.250443: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x3aa4000 executing computations on platform CUDA. Devices: 2019-03-29 19:22:37.250458: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): GeForce RTX 2070, Compute Capability 7.5 2019-03-29 19:22:37.271245: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2208000000 Hz 2019-03-29 19:22:37.271958: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x3aa2750 executing computations on platform Host. Devices: 2019-03-29 19:22:37.271972: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): <undefined>, <undefined> 2019-03-29 19:22:37.272241: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties: name: GeForce RTX 2070 major: 7 minor: 5 memoryClockRate(GHz): 1.44 pciBusID: 0000:01:00.0 totalMemory: 7.77GiB freeMemory: 6.80GiB 2019-03-29 19:22:37.272275: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0 2019-03-29 19:22:37.273295: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-03-29 19:22:37.273308: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0 2019-03-29 19:22:37.273314: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0: N 2019-03-29 19:22:37.273435: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 6612 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2070, pci bus id: 0000:01:00.0, compute capability: 7.5) 2019-03-29 19:22:38.761993: E tensorflow/stream_executor/cuda/cuda_dnn.cc:334] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR 2019-03-29 19:22:38.763178: E tensorflow/stream_executor/cuda/cuda_dnn.cc:334] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR /home/bobi/Desktop/cudnn/node_modules/@tensorflow/tfjs-core/dist/engine.js:132 throw ex; ^

Error: Invalid TF_Status: 2 Message: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above. at NodeJSKernelBackend.executeSingleOutput (/home/bobi/Desktop/cudnn/node_modules/@tensorflow/tfjs-node-gpu/dist/nodejs_kernel_backend.js:192:43) at NodeJSKernelBackend.conv2d (/home/bobi/Desktop/cudnn/node_modules/@tensorflow/tfjs-node-gpu/dist/nodejs_kernel_backend.js:700:21) at environment_1.ENV.engine.runKernel.x (/home/bobi/Desktop/cudnn/node_modules/@tensorflow/tfjs-core/dist/ops/conv.js:152:27) at /home/bobi/Desktop/cudnn/node_modules/@tensorflow/tfjs-core/dist/engine.js:171:26 at Engine.scopedRun (/home/bobi/Desktop/cudnn/node_modules/@tensorflow/tfjs-core/dist/engine.js:126:23) at Engine.runKernel (/home/bobi/Desktop/cudnn/node_modules/@tensorflow/tfjs-core/dist/engine.js:169:14) at conv2d_ (/home/bobi/Desktop/cudnn/node_modules/@tensorflow/tfjs-core/dist/ops/conv.js:151:40) at Object.conv2d (/home/bobi/Desktop/cudnn/node_modules/@tensorflow/tfjs-core/dist/ops/operation.js:46:29) at /home/bobi/Desktop/cudnn/node_modules/@tensorflow/tfjs-layers/dist/layers/convolutional.js:198:17 at /home/bobi/Desktop/cudnn/node_modules/@tensorflow/tfjs-core/dist/engine.js:116:22

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Reactions:3
  • Comments:8 (1 by maintainers)

github_iconTop GitHub Comments

7reactions
piercuscommented, Oct 1, 2019

As explained in https://github.com/tensorflow/tfjs/issues/671#issuecomment-494832790

There is workaround by setting global variable export TF_FORCE_GPU_ALLOW_GROWTH=true

2reactions
adwelljcommented, Apr 10, 2019

@bobiblazeski, I punted over to trying on Windows and finally just got this working. I had to drop down to tfjs-node-gpu version 0.3.2 due to node-gyp issues.

However, once I finally got it to install, I later ran in to this same CuDNN issue! Fortunately, using CUDA 9.0 (needed for 0.3.2 compatibility) I got a better error message before the “This is probably because cuDNN failed to initialize…” message, stating that tfjs-node-gpu was built against CuDNN version 7.2. Once I downloaded that version, everything is working.

I haven’t went back to see if I could get it to work on the LINUX install, but I’m hoping that this could just be a CuDNN version incompatibility issue that you could experiment with. Luckily CuDNN doesn’t have an install / uninstall process; it’s simply copying the extracted files in to a dedicated directory that you include in your system path.

I hope that helps give you some possible direction!

Read more comments on GitHub >

github_iconTop Results From Across the Web

could not create cudnn handle - Stack Overflow
In Tensorflow 2.0, my issue was resolved by setting the memory growth. ConfigProto is deprecated in TF 2.0, I used tf.config.experimental.
Read more >
Could not create cudnn handle - NVIDIA Developer Forums
Hello, I'm trying to compile, for cuDDN bu i got this error here. Can some one help me with that? 2019-05-06 18:23:55.356327: I ......
Read more >
Could not create cudnn handle ...
I create the project, extract the frames and label them in DLC-CPU. Then I open another terminal, activate DLC-GPU, open the same notebook, ......
Read more >
Could not create cudnn handle - iTecNote
Could not create cudnn handle : CUDNN STATUS INTERNAL ERROR. algorithmcudnn. I'm trying to create machinelearing in python 3.
Read more >
Could not create cudnn handle - Juniarto Samsudin - Medium
Could not create cudnn handle : CUDNN_STATUS_INTERNAL_ERROR. I run into this nasty error code when trying to execute TensorFlow or Keras python code.....
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found